Page MenuHomePhabricator

Upload Android notification payload to blob service if its size exceeds limits
AbandonedPublic

Authored by marcin on Jul 21 2023, 3:03 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Dec 27, 9:44 AM
Unknown Object (File)
Fri, Dec 27, 9:44 AM
Unknown Object (File)
Fri, Dec 27, 9:44 AM
Unknown Object (File)
Fri, Dec 27, 9:44 AM
Unknown Object (File)
Fri, Dec 27, 9:44 AM
Unknown Object (File)
Fri, Dec 27, 9:43 AM
Unknown Object (File)
Fri, Dec 27, 9:33 AM
Unknown Object (File)
Sun, Dec 22, 8:29 PM
Subscribers

Details

Reviewers
bartek
tomek
kamil
Summary

This differential uses blob service upload utility method and ability to conditionally skip olm session database update to encrypt notifications for Android client and upload its AES encrypted paylaod to blob service if its olm-encrypted
payload exceeds fcm push limits.

Test Plan

Firstly build the Android app without applying this differential. Send a couple of notifications and issue select * from olm_sessions; query in mariadb comm shell after each notification. This way you will see that each notification results in advancing olm session version by two. The reason is that we encrypt notifications with and without message infos, to see whether version with message infos fits into fcm limit after encryption. This is the issue since the client has to skip keys and keep them forever. This issue is resolved with this differential alongside with blob service upload feature introduction.

Now the REAL test plan for this differential:

  1. Change NEXT_CODE_VERSION to 0 in lib/shared/version-utils.js. Replace production blob service address with local blob service address in CommAndroidNotificationsBlobServiceClient.java and lib/facts/blob-service.js. Build Android app (can be emulator).
  2. Sent ordinary (small) notification. Issue select * from olm_sessions; query. Ensure that with each notification olm session version is advanced by ONE. This tests the bug fix.
  3. Background the app. Send big notification (type 5000 * 'x' in python shell and copy the string). Ensure that the banner is displayed in Android client. Kill the keyserver and open the app. Keyserver is killed but the message is visible since we fetched it from blob service.
  4. Kill blob service. Repeat step 3. Ensure banner is displayed but the message is not visible after opening the app. This tests that fallback behaviour works properly.

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

keyserver/src/push/crypto.js
93 ↗(On Diff #28926)

serializedPayload variable here keeps olm-encrypted stringified notification payload. However in the notification that is send to the client we keep in under encryptedPayload key so we need to validate the size of the entire { encryptedPayload: serializedPayload.body } JSON not just the data we get out of olm encrytion.

130 ↗(On Diff #28926)

for each Android notification we include information whether its payload (regardless of whether encryption was successful or not) exceeds some limit. If notification is encrypted and payloadSizeViolated is true then it means that encryption was successful, notification payload after encryption exceeds limit and hence, olm session state in database was not updated. We can further use this information to remove some data from notification, encrypt smaller notification and send it to the client without worrying that they will have to skip and infinitely keep some keys.

137 ↗(On Diff #28926)

Similarly, we have to check the size of entire notification that we send {data: {id, badgeOnly, encryptedPayload: string}} not just raw data we get out of encryption.

keyserver/src/push/send.js
828 ↗(On Diff #28926)

This change was never needed. Notification text is obtained with function notifTextsForMessageInfo that does necessary string trimming to at most 300 characters. So if we don't include messageInfos property (which is not trimmed) we are safe within FCM limits.

keyserver/src/push/types.js
17 ↗(On Diff #28926)

We shouldn't allow both of messageInfos and blobHash: string, encryptionKey: string to be present in notification at the same time, since the reason we include blobHash and encryptionKey is that notification with messageInfos was too large to be sent.

I haven't reviewed this closely... just leaving some minor notes.

Would appreciate it if one of the listed reviewers could take a closer look. It's been almost a week since this diff was put up... we should aim to take action on diffs within around 24 hours of when they are created.

keyserver/src/push/crypto.js
77 ↗(On Diff #28926)

Is there a reason this one isn't read-only?

keyserver/src/push/send.js
863 ↗(On Diff #28926)

Let's stick with >=

875 ↗(On Diff #28926)

Please keep all lines to a max of 80 chars

1341 ↗(On Diff #28926)

await should only appear as the first keyword in a line, or the first keyword on the right-hand side of an assignment

keyserver/src/push/types.js
17 ↗(On Diff #28926)

Flow generally gets confused when we use constructions like this (cc @bartek who dealt with it a lot)

Instead we should define the shared part and then have a union that spreads the shared part in each joined type:

type SharedAndroidNotificationPayload = {
  +badge: string,
  +body?: string,
  +title?: string,
  +prefix?: string,
  +threadID?: string,
  +encryptionFailed?: '1',
};

export type AndroidNotificationPayload =
  | {
      ...SharedAndroidNotificationPayload,
      +messageInfos?: string,
    }
  | {
      ...SharedAndroidNotificationPayload,
      +blobHash: string,
      +encryptionKey: string,
    };
  1. Add missing read-only.
  2. Refactor AndroidNotificationPayload type.
  3. Use >= for NEXT_CODE_VERSION check.
  4. Refactor blobUploadArror logging to keep max 80 characters per line.
  5. Put await as first keyword.
keyserver/src/push/send.js
863 ↗(On Diff #29163)

Do we really want to set it to NEXT_CODE_VERSION and not the FUTURE_ one? Services access token stuff might not be ready by that time

tomek requested changes to this revision.Jul 28 2023, 6:03 AM
tomek added inline comments.
keyserver/src/push/crypto.js
77 ↗(On Diff #28926)

Maybe exceeded instead of violated?

keyserver/src/push/send.js
870 ↗(On Diff #29163)

Is copyWithMessageInfos.data encrypted?

880 ↗(On Diff #29163)

When can this happen?

892 ↗(On Diff #29163)

Are there some ways of avoiding dependence on arrays ordering? (usages of idx)

Context: having the same ordering in independent arrays is fragile and can easily break by someone changing underlying functions.

828 ↗(On Diff #28926)

If we're reverting a change that we realized is not needed, can we do that in a separate diff?

This revision now requires changes to proceed.Jul 28 2023, 6:03 AM
keyserver/src/push/crypto.js
77 ↗(On Diff #28926)

Definitely no reason. My oversight.

keyserver/src/push/send.js
863 ↗(On Diff #29163)

That's valid point. Every usage of NEXT_CODE_VERSION should be replaced with FUTURE_CODE_VERSION.

870 ↗(On Diff #29163)

It is not encrypted. blobServiceUpload function AES-encrypts it before sending to blob service.

880 ↗(On Diff #29163)

blobServiceUpload function returns blobUploadError if upload fails or blobHash and encryptionKey if it succeeds. So basically this condition is true if either blobUploadError is truthy or none of blobHash, encryptionKey, blobUploadError is truthy. The latter happens if canQueryBlobService is falsy. This code would be more readable if I defined blobHash, encryptionKey, blobUploadError inside canQueryBlobService if statement. On the other hand it would increase indentation. I wanted to stick to conventions even if it meant writing code that takes a bit more time to understand.

keyserver/src/push/crypto.js
77 ↗(On Diff #28926)

I will introduce this change shortly.

tomek added inline comments.
keyserver/src/push/send.js
905 ↗(On Diff #29437)

Is it possible that the size is exceeded only for some devices?

This revision is now accepted and ready to land.Aug 2 2023, 3:50 AM
keyserver/src/push/send.js
905 ↗(On Diff #29437)

It is possible that for some devices notification encryption fails while for some succeeds. In such case it is possible that unencrypted notification with messageInfos does not exceed size limit while encrypted notification with messageInfos exceeds zie limits.

It is a little bit of an edge-case, so if you consider it an over-engineering I can change this behaviour to check encrypted notification size for only one device. I wasn't sure myself but eventually I decided if it doesn't complicate code too much to cover all cases then we should do so.

Rename payloadSizeViolated to payloadSizeExceeded and devicesWithSizeViolation to devicesWithExcessiveSize

Those changes will be included in updated version of parent differential.