Page MenuHomePhabricator

[Keyserver] Publish prekeys to identity service for cron job
AbandonedPublic

Authored by kamil on Jul 11 2023, 10:09 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Apr 16, 7:40 AM
Unknown Object (File)
Mon, Apr 15, 11:51 PM
Unknown Object (File)
Mon, Apr 15, 1:40 AM
Unknown Object (File)
Mon, Apr 15, 12:26 AM
Unknown Object (File)
Sun, Apr 14, 11:44 PM
Unknown Object (File)
Sun, Apr 14, 4:14 PM
Unknown Object (File)
Sat, Apr 13, 11:06 AM
Unknown Object (File)
Fri, Apr 12, 5:05 PM

Details

Summary

Have cron job for prekeys lifecycle publish
the keys to identity service.

Part of https://linear.app/comm/issue/ENG-3944

Depends on D8475

Test Plan

Run identity service + localstack

  • Start docker
  • Start localstack
  • Run terraform
  • Run identity service
cd keyserver
RUST_LOG=debug yarn dev

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

ashoat requested changes to this revision.Jul 11 2023, 12:17 PM

Talked about this offline with @jon and we're going to make some changes here. @jon will summarize

keyserver/src/cron/cron.js
117–124 ↗(On Diff #28595)
This revision now requires changes to proceed.Jul 11 2023, 12:17 PM
keyserver/src/utils/olm-utils.js
141 ↗(On Diff #28595)

Use console.warn

jon marked 2 inline comments as done.

Address feedback

keyserver/src/cron/cron.js
117–124 ↗(On Diff #28595)

summary: usage of async constructs a promise, no need to nest promises when a promise is being returned.

keyserver/src/utils/olm-utils.js
141 ↗(On Diff #28595)

applied to other usage as well.

ashoat requested changes to this revision.Jul 20 2023, 11:08 AM

Only got through the first file. You still don't understand the feedback I've been repeating on every one of your keyserver diffs. You need to step back and try to understand async/await in JS. Please make sure I don't have to give you this feedback again.

keyserver/src/cron/cron.js
117 ↗(On Diff #28897)

It should be contentAccount to match conventions. Please try to proactively identify and follow conventions... ideally we can avoid these sorts of nits during code review

118 ↗(On Diff #28897)

We've been over this 2 or 3 times at this point, haven't we? Please stop calling async functions like this. You must ALWAYS either await an async function, pass it to somewhere where it is awaited, or wrap it in handleAsyncPromise

121 ↗(On Diff #28897)

Same problem here!! I've tried explaining this to you several times. Please do some research, and make sure you don't submit another diff or diff update without 100% fully understanding this feedback that I've repeatedly shared.

This revision now requires changes to proceed.Jul 20 2023, 11:08 AM

apply camelCase, fix returning a promise vs return a closure (void).

jon added inline comments.
keyserver/src/cron/cron.js
121 ↗(On Diff #28897)

For some reason, I was mentally omitting the curly braces which changed the semantics from returning values to just executing a scope.

jon added inline comments.
keyserver/src/cron/cron.js
121 ↗(On Diff #28897)

(rust also allows for the keyword return to be omitted when the last line of a scope returns a value)

e.g.

  ...
  return result;
}
// In rust, this is the same as
   ... 
   result
}
ashoat added inline comments.
keyserver/src/utils/olm-utils.js
130 ↗(On Diff #28968)
171 ↗(On Diff #28968)

Maybe add a code comment saying "deprecated, don't use!"

This revision is now accepted and ready to land.Jul 27 2023, 11:38 AM
jon marked an inline comment as done.

Apply feedback

kamil edited reviewers, added: jon; removed: kamil.
keyserver/src/utils/olm-utils.js
157

Maybe I am missing a lot of contexts here - and if yes sorry for the confusion, but I think this diff needs to be revisited, probably this was never tested.

publish_prekeys is not a proper name, should be publishPrekeys (it was renamed in D8930) - looks like flow passed this because this code wasn't rebased

Even with the proper name, this shouldn't work as this function was never bound to JS, fixed D9106 - maybe there is some rust magic that made this work I am not aware of, but looks like this shouldn't work without this change.

I understand that I am not a reviewer but @kamil wanted me to see this diff and I would like to note one thing. This cron job will only fire if keyserver's very first prekey's were marked as published somewhere else. In practice keyserver prekeys are published for the first time here: https://github.com/CommE2E/comm/blob/master/keyserver/src/responders/keys-responders.js#L59. However the only reason prekeys are marked as published there is that at the time this code was implemented we wanted to launch e2e notifs ASAP but the identity service was not ready, so we agreed on a temporary solution to mark prekey's as published when user fetches them to initialise olm notifications session with the keyserver. That said as of this differential this line should be deleted and we should find a place to publish keyserver's prekeys for the very first time after olm accounts for the keyserver are created.

ashoat requested changes to this revision.EditedSep 12 2023, 6:48 AM

Looking at the code of our Olm fork, it looks like the only effect of calling mark_prekey_as_published() is on the result of calling unpublished_prekey(). The only place we check unpublished_prekey is in the above code, where we use it to make sure that we don't throw away an old prekey before the newer prekey is published.

@marcin correctly points out that we have some callsites today that set mark_prekey_as_published. These calls are potentially dangerous because of the following scenario:

  1. Keyserver publishes initial prekey to identity service
  2. Keyserver rotates prekey (generating a second one), but somehow fails to publish it to the identity service
    • Identity service is still using initial (first) prekey
  3. One of the above callsites erroneously marks the second prekey as published, even though it has not been published to the identity service
    • Identity service is still using initial (first) prekey
  4. Eventually the cronjob above decides to delete the old prekey
    • Identity service is still using this old prekey for the keyserver, but that prekey is now deleted from the keyserver
    • Clients will fail to initialize sessions with the keyserver, because they are using a prekey that the keyserver is not aware of

We can instead consider removing the two linked callsites of mark_prekey_as_published, as @marcin suggested. The result will be that prekey rotation will be disabled until the existing prekey is successfully published to the identity service. However, we might still have a potential issue for the existing prekeys, which are marked as published but have not been published to the identity service. We will either need to make sure these prekeys are published to the identity service using some other mechanism, or else find some way to throw them away in favor of new prekeys that are only marked as published once they have been published to the identity service.

This is admittedly a rather complex subject. I'm happy to discuss it in more depth, either async or in a meeting.

keyserver/src/utils/olm-utils.js
87–93

For consistency with shouldForgetPrekey, I think we could restructure this code:

// Our fork of Olm only remembers two prekeys at a time.
// If the new one hasn't been published, then the old one is still active.
// In that scenario, we need to avoid rotating the prekey because it will
// result in the old active prekey being discarded.
if (account.unpublished_prekey()) {
  return false;
}

const currentDate = new Date();
const lastPrekeyPublishDate = new Date(account.last_prekey_publish_time());

return currentDate - lastPrekeyPublishDate >= maxPublishedPrekeyAge;

(Note that I changed the > to a >=)

97

It might be good to add a comment here. Something like "Our fork of Olm only remembers two prekeys at a time. We have to hold onto the old one until the new one is published."

157

Thanks for identifying these issues, @kamil! Does that mean that this should be renamed to publishPrekeys before landing?

171

If this is deprecated, what is our plan for addressing existing callsites? Is there a task to track this? And will it be okay to launch Tunnelbroker to production before those existing callsites are updated?

This revision now requires changes to proceed.Sep 12 2023, 6:48 AM

Here I described how to resolve all the issues: ENG-3944
This is a complicated and complex thing so I broke this into logical parts and created a stack starting with D9419 to make it easier for reviewers.
Added @ashoat as a reviewer since he requested changes here, after that stack is done I will abandon this one.

keyserver/src/utils/olm-utils.js
87–93
97
130

fixed in D9421

157

yes, fixed and tested in D9421

171

I removed all other call sites: D9423