Page MenuHomePhabricator

Decrypt encrypted web notifications in service worker
ClosedPublic

Authored by marcin on Oct 31 2023, 10:20 AM.
Tags
None
Referenced Files
F3371617: D9661.diff
Tue, Nov 26, 5:20 AM
Unknown Object (File)
Tue, Oct 29, 5:27 AM
Unknown Object (File)
Tue, Oct 29, 5:27 AM
Unknown Object (File)
Tue, Oct 29, 5:25 AM
Unknown Object (File)
Tue, Oct 29, 5:24 AM
Unknown Object (File)
Tue, Oct 29, 5:24 AM
Unknown Object (File)
Tue, Oct 29, 5:21 AM
Unknown Object (File)
Tue, Oct 29, 5:19 AM
Subscribers

Details

Summary

This differential decrypts encrypted notification in service-worker on web.

Test Plan

Execute the test plan below for all browsers supported by Comm - Safari, Firefox and Chrome.

Step 1: Test single notification decryption correctness.

  1. Change FUTURE_CODE_VERSION to 0 (to enable notification encryption in your local environment).
  2. Send notifications to web app.
  3. Ensure that decrypted content is displayed.
  4. Using MariaDB console ensure that version of olm session associated with cookie id of currently logged in user increments with each notification.

Step 2: Test large notifications influx:

  1. Apply the following diff to the keyserver send.js code: https://gist.github.com/marcinwasowicz/8a7f5d9171d3d2df7408f26176639be4
  2. Send one message to web client.
  3. Expect to observe 300 notifications.
  4. Ensure that all notifications that are correctly decrypted
  5. If there appears incorrectly decrypted notification then ensure that:
    • there is very few of them (1-5)
    • they appear as generic Comm notification, with error message for dev and not body for prod.
    • they can still deep link to the web app but not necessarily to the correct thread.

Repeat for prod and dev.

Diff Detail

Repository
rCOMM Comm
Branch
marcin/eng-5191-v2
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

Two things to note:

  1. This differential is not 100% correct - we still need to display something even if decryption fails (context is here: https://github.com/w3c/push-api/issues/313 and here: https://pushpad.xyz/blog/silent-push-sending-web-push-messages-without-displaying-notifications). We will need to pass to service worker an information if currently logged in user is staff and based on that information display actual error or sth generic like: Comm message. I haven't done this work yet bu I decided to put the diff to get initial feedback on decryption and prove that my work on the goal is almost done.
  2. Web notification don't work in my environment at all (even current prod). I will have to figure it out or ask a team member to patch and execute test plan. I was testing this code by logging notitication after decryption in service worker (this parts works for me - only displaying doesn't work in my environment).

Display notification with error message to staff users and blank, generic notification to non-staff users.

Two things to note:

  1. This differential is not 100% correct - we still need to display something even if decryption fails (context is here: https://github.com/w3c/push-api/issues/313 and here: https://pushpad.xyz/blog/silent-push-sending-web-push-messages-without-displaying-notifications). We will need to pass to service worker an information if currently logged in user is staff and based on that information display actual error or sth generic like: Comm message. I haven't done this work yet bu I decided to put the diff to get initial feedback on decryption and prove that my work on the goal is almost done.
  2. Web notification don't work in my environment at all (even current prod). I will have to figure it out or ask a team member to patch and execute test plan. I was testing this code by logging notitication after decryption in service worker (this parts works for me - only displaying doesn't work in my environment).

Both are outdated.

Fix: get notification title from plainTextData instead of data (undefined if data is encrypted).

web/push-notif/notif-crypto-utils.js
18–21

I think in the codebase wy try to avoid using $Diff. It's better to define PlainTextWebNotificationPayload as separate type, and PlainTextWebNotification as id and spreading PlainTextWebNotificationPayload

48

I think something like Necessary data not found in IndexedDB sounds a bit better, because current error makes me feel that data should be send from main thread using message or event

96

shouldn't this be e.message?

web/push-notif/service-worker.js
38–39

can we move https://web.comm.app/favicon.ico to a const to avoid copy pasting URL multiple times?

  1. Use new plain text web notification payload type.
  2. Replace e -> e.message.
  3. Extract comm icon url to constant.
  4. Update error message when utilsData are missing.

Handle Safari bug when storing CryptoKeys

web/push-notif/notif-crypto-utils.js
50 ↗(On Diff #32766)

Might be good to explain why Safari is special-cased here with a comment

Could we amend the test plan with testing on safari and testing of incorrectly decrypted notifications?

This revision is now accepted and ready to land.Nov 7 2023, 3:43 AM
marcin edited the test plan for this revision. (Show Details)

Explain special treatment of safari browser.

Implement "debounce" approach to resolve the risk of skipped keys overflow when receiving large number of
notifications at once

This seems like a significant change. @marcin, should you consider re-requesting review?

Re-requesting review since decryption strategy changed significantly. The algorithm works like this:

  1. Instead of having just one olm session, we keep two sessions: mainSession and pendingSessionUpdate. We also keep track of the time stamp when pendingSessionUpdate was created (updateCreationTimestamp). Initially (session instantiation with the keyserver) there is only mainSession.
  2. When notification arrives we firstly check if there is pendingSessionUpdate and if the difference between updateCreationTimestamp and current time stamp (Date.now()) is greater than some constant.
  3. If it is then we:
    • try to decrypt notification with pendingSessionUpdate. If the decryption is successful then mainSession becomes pendingSessionUpdate. pendingSessionUpdate becomes the state of the session after decryption. updateCreationTimestamp is updated to Date.now(). If the decryption fails we decrypt with the mainSession and pendingSessionUpdate becomes the state of mainSession after decryption. updateCreationTimestamp is updated to Date.now().
  4. If it is not then we:
    • Decrypt with the mainSession and pendingSessionUpdate becomes the state of mainSession after decryption. updateCreationTimestamp is updated to Date.now().
  5. Finally all updated data are bundled together and stored in IndexedDB.

I'm confused by the description above.

When notification arrives we firstly check if there is pendingSessionUpdate and if the difference between updateCreationTimestamp and current time stamp (Date.now()) is greater than some constant.

So if the difference in time is too small, then this condition would fail, and pendingSessionUpdate would be ignored? This is a little confusing to me – I figured we would always try with the pendingSessionUpdate first.

If the decryption is successful then mainSession becomes pendingSessionUpdate

I thought that we would "debounce" here, meaning that mainSession would not become pendingSessionUpdate until some delay. That way we are able to keep the old mainSession around in case it's needed.

I get your point @ashoat and it is true that starting with pendingSessionUpdate has some benefits since we start from more advanced position in keys chain which has better performance. However it leads to some confusions that need to be resolved.

  1. Assuming that decryption with pendingSessionUpdate succeeds we are left with two states: pendingSessionUpdate, updateCreationTimestamp and newPendingSessionUpdate, newUpdateCreationTimestamp.

If the decryption is successful then mainSession becomes pendingSessionUpdate

I thought that we would "debounce" here, meaning that mainSession would not become pendingSessionUpdate until some delay. That way we are able to keep the old mainSession around in case it's needed.

If I understand your suggestion correctly then here we check if difference between updateCreationTimestamp and Date.now() is greater than constant and if it is then mainSession is updated. It can be updated with pendingSessionUpdate or newPendingSessionUpdate. I think it is better to use pendingSessionUpdate to keep some spare space for skipped keys between those two. Regardless of the difference between updateCreationTimestamp and Date.now() newPendingSessionUpdate, newUpdateCreationTimestamp replace their older versions.

  1. Assuming that decryption with pendingSessionUpdate fails we decrypt with mainSession and then either:
    1. update nothing and return OR
    2. update pendingSessionUpdate with olm state resulting from decrypting with mainSession (it is going back in the keys chain) OR
    3. keep pendingSessionUpdate unchanged but bump updateCreationTimestamp to Date.now() to reduce the chance that on next notification mainSession will be update with pendingSessionUpdate that has just proved to be unable to decrypt a notification.

I don't like idea 2. since we should aim to be as far in the keys chain as possible while keeping some insurance to be able to decrypt everything. I suggest idea 3. to keep pendingSessionUpdate but bump timestamp to wait longer to overwrite mainSession with it.

I order to reduce the number of if statements I think we can create pendingSessionUpdate and updateCreationTimestamp and the time of session initialization with the keyserver. The value of pendingSessionUpdate will be the copy of mainSession

Change debounce approach. Always try to decrypt with pendingSessionUpdate first.

To be frank I was confused before and I'm more confused now.

Let's follow up in our 1:1 on Monday. Meanwhile, some quick clarifications:

  1. In a debounce approach, the debounced function always executes on a timer some time AFTER the last time the debounced function is called.
    • We can't use a timer in the NSE since it won't be running after some time. Instead, we'll need to use a timestamp, and to check that timestamp the next time the NSE is triggered, BEFORE processing the new notif.
    • That means that pendingSessionUpdate will never be saved to mainSession after a notif is processed. Instead, it will happen BEFORE a notif is processed.
  2. If there is a pendingSessionUpdate, we always want to check it first.
    • If the "BEFORE" step does not result in setting mainSession = pendingSession (and thus deleting pendingSession), then when the new notif is processed, we should always start by trying the pendingSession.
    • Trying the mainSession will lead to unnecessary "skipped message keys"

To be frank I was confused before and I'm more confused now.

Let's follow up in our 1:1 on Monday. Meanwhile, some quick clarifications:

  1. In a debounce approach, the debounced function always executes on a timer some time AFTER the last time the debounced function is called.
    • We can't use a timer in the NSE since it won't be running after some time. Instead, we'll need to use a timestamp, and to check that timestamp the next time the NSE is triggered, BEFORE processing the new notif.
    • That means that pendingSessionUpdate will never be saved to mainSession after a notif is processed. Instead, it will happen BEFORE a notif is processed.
  2. If there is a pendingSessionUpdate, we always want to check it first.
    • If the "BEFORE" step does not result in setting mainSession = pendingSession (and thus deleting pendingSession), then when the new notif is processed, we should always start by trying the pendingSession.
    • Trying the mainSession will lead to unnecessary "skipped message keys"

The code in the latest version of this differential matches what is stated above. The only difference is that pendngSessionUpdate is saved after successfully processing the notif using this session. I understand that is not entirely compliant with debounce pattern but it is safer approach since if we start by mainSession = pandingSessionUpdate then we have no way to recover if pendingSessionUpdate fails to decrypt the notif.

I spoke with @marcin about this in our most recent 1:1. We clarified some differences between my initial proposed approach and what @marcin has implemented here.

The only difference is in scenarios where the pendingSession is ready to be committed ("debounce" timer has run out) when a new notif is received. In that scenario:

  • My approach would immediately commit the pendingSession to mainSession BEFORE processing the notif
    • When the notif is processed, there is only mainSession to use
    • After the notif is processed, mainSession will be the pendingSession BEFORE the notif was processed, and pendingSession will that same session AFTER the notif was processed
  • @marcin's approach saves the pendingSession to a temporary variable. Let's call it oldPendingSession
    • When the notif is processed, it attempts to use oldPendingSession
    • If it is successful, then we have the same result as in my approach – mainSession becomes oldPendingSession (the pendingSession BEFORE the notif was processed), and pendingSession is that same session AFTER the notif was processed
    • However, the difference occurs if the attempt to use oldPendingSession fails. In that scenario, @marcin's approach would then try the old mainSession

When comparing the approaches, mine has the benefit of being slightly more simple. However, @marcin's has the benefit of being able to use the old mainSession if necessary.

Given that this scenario only occurs after the debounce timer concludes, I suspect that we would not need to use the mainSession. However, I don't see too much of a downside from the additional code complexity, as it has already been implemented. So I'm content to leave this approach as-is.

This revision is now accepted and ready to land.Nov 22 2023, 6:25 AM