Page MenuHomePhabricator

Implement stateful and deferrable notification decryption and use it on iOS in NSE
ClosedPublic

Authored by marcin on Oct 6 2023, 9:37 AM.
Tags
None
Referenced Files
F3246277: D9401.id31993.diff
Thu, Nov 14, 10:29 PM
F3246170: D9401.id32215.diff
Thu, Nov 14, 9:47 PM
F3245614: D9401.diff
Thu, Nov 14, 7:40 PM
Unknown Object (File)
Mon, Nov 11, 1:24 PM
Unknown Object (File)
Sun, Nov 10, 5:05 AM
Unknown Object (File)
Thu, Nov 7, 8:27 AM
Unknown Object (File)
Mon, Oct 28, 7:18 PM
Unknown Object (File)
Mon, Oct 28, 10:36 AM
Subscribers

Details

Summary

This differential implements additional methods in NotificationsCryptoModule that are stateful so that we cen decrypt notification and decide to upda te
notification olm file later. This is necessary to test potential theory that BAD_MESSAGE_MAC error stems from NSE callback being retried by iOS before the first run
calls contentHandler but correctly decrypts notification.

Test Plan
  1. Add this method to NotificationsCryptoModule.cpp: https://gist.github.com/marcinwasowicz/f81464d953894bb65f7fd0a891f7def8
  2. Call it inn NSE before decryption and after decryption. In both cases log the result.
  3. Send two notifications.
  4. Examine console logs from NSE.
  5. Ensure that the second pretty print of olm file for the first notification is the same as the first pretty print for the second notification. This will ensure that the file was indeed updated.

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

native/ios/NotificationService/NotificationService.mm
224 ↗(On Diff #31749)

I experimented a bit and found that putting this code after we call contentHandler also works. Additionally Apple docs never state calling contentHandler must be the very last action we take in NSE callback. They state that we must call the contentHandler "at some point in implementation" https://developer.apple.com/documentation/usernotifications/unnotificationserviceextension/1648229-didreceivenotificationrequest#discussion. Nevertheless it seems rather risky to do so in production since there are no guarantees as to how much time we have left after calling contentHandler. We are only guaranteed to have 30 seconds between calling NSE callback and calling contentHandler.

459 ↗(On Diff #31749)

I decided to remove this method since it didn't bring much value and keeping it would force us to have two methods with signatures ending with withCryptoModule:(comm::NotificationsCryptoModule &)statefulNotifCryptoModule which I consider boilerplate.

native/ios/NotificationService/NotificationService.mm
224 ↗(On Diff #31749)

If this diff doesn't change anything we can consider to call flushState after calling the contentHandler. However we wouldn't call it directly here but we would dispatch it to the main queue. If NSE indeed retries didReceive... then perhaps it doesn't retry the whole process but only some threads. In such case the main thread should remain.

marcin requested review of this revision.Oct 6 2023, 9:54 AM

Avoiding "request changes" to keep this on other reviewers' queues

native/cpp/CommonCpp/Notifications/BackgroundDataStorage/NotificationsCryptoModule.h
17 ↗(On Diff #31749)

This feels a little weird...

  1. It is always instantiated, even if it's not needed
  2. If multiple callers call the same NotificationsCryptoModule, there can be issues if they use statefulCryptoModule. This is a confusing "contract" to the caller, who ends up needing to construct a distinct NotificationsCryptoModule to be safe
  3. We have a single class supporting two very different use cases

Can you refactor this to be cleaner? Here's a suggestion:

  1. You could have a utility function to create a statefulCryptoModule in NotificationsCryptoModule, but make it an "opaque type" that the caller can't do anything with (eg. void *)
  2. Then you could have utility functions in NotificationsCryptoModule that handle a statefulCryptoModule individually. This would return NotificationsCryptoModule to being thread-safe (correct me if it's not)
  3. At this point, we could remove the instantiation of a unique NotificationsCryptoModule from NotificationService, and simply used the aforementioned utilities

Curious for @tomek and @bartek's perspectives on this as well.

native/ios/NotificationService/NotificationService.mm
224 ↗(On Diff #31749)

In our 1:1 today, I discussed the tradeoffs around this decision with @marcin.

If we leave it as-is in this diff currently, then there is still a possibility of the NSE "failing" (in an unobservable way) between the persistence step and the display of the notif. If that happens, we'll see the same error we're currently seeing in ENG-4866. That may force us to question whether this solution is effective or not. To be able to distinguish, we'll need to wait a long enough period to discern whether the issue is occurring less than before.

On the other hand, if we reorder these calls, we might potentially lead to a scenario where the updated Olm state is not persisted successfully. This should not lead to any loss of product functionality, but if it happens repeatedly it can hurt performance, and potentially lead to some notifs not being processed successfully if they are delivered out-of-order. Based on @marcin's testing above it seems like this is not likely, but it is possible.

My preference would probably be to reorder the calls, but I don't feel too strongly.

native/cpp/CommonCpp/Notifications/BackgroundDataStorage/NotificationsCryptoModule.h
17 ↗(On Diff #31749)

It is always instantiated, even if it's not needed

It is an instance variable so it is not instantiated unless we instantiate NotificationsCryptoModule. Calls to static methods of this class do not instantiate this variable.

Each instance of NotificationsCryptoModule has its own statefullCryptoModule so as long as NotificationsCryptoModule instance is instantiated per thread there are no risks.

I agree that the solution you proposed might be cleaner but I am not sure if it is actually safer. If nothing prevents the caller from passing NotificationsCryptoModule instance between threads then what should prevent them from passing returned statefullCryptoModule between threads?

I am willing to proceed with your solution. This point in particular convinces me that is might be better approach:

We have a single class supporting two very different use cases

However I just wanted to note that there are no technical issues with this code.

native/cpp/CommonCpp/Notifications/BackgroundDataStorage/NotificationsCryptoModule.h
17 ↗(On Diff #31749)

A solution where we have utils returning a stateful instance sounds cleaner and more maintainable. As for the thread safety, it indeed seems like there isn't too much difference - either both approaches are thread-safe, or none of them. We're not passing NotificationsCryptoModule between threads and we won't be doing that with a stateful instance, so it should be safe.

native/ios/NotificationService/NotificationService.mm
224 ↗(On Diff #31749)

Agree, it seems that reordering the calls makes sense. It sounds, based on the docs https://developer.apple.com/documentation/usernotifications/unnotificationserviceextension/1648229-didreceivenotificationrequest#discussion, that we can call the handler whenever we want. It might happen that e.g. flushing the state will take too much time and will time out, but as described by @ashoat it shouldn't be a huge issue.

Wondering, how long on average, it takes to process a notification - if we are orders of magnitude faster than the limit (not more than 30s), or just close to it.

Remove NotificationsCryptoModule instantiation. Return opaque type containing crypto module state accessible only from NotificationsCryptoModule implementation file.

native/cpp/CommonCpp/Notifications/BackgroundDataStorage/NotificationsCryptoModule.cpp
312–314 ↗(On Diff #31869)

std::make_unique<CryptoModuleStateHolder>(cryptoModule) won't work since the constructor of CryptoModuleStateHolder is private so std::make_unique would have to be a friend method of CryptoModuleStateHolder class.

native/cpp/CommonCpp/Notifications/BackgroundDataStorage/NotificationsCryptoModule.h
54–57 ↗(On Diff #31869)

This is the only way to return a type that is genuinely opaque outside of NotificationsCryptoModule implementation. Returning something like void* doesn't help since we can always cast.

If any reviewer finds the resulting code too complicated and request changes I can refactor it to just return std::unique_ptr<crypto::CryptoModule> from NotificationsCryptoModule.

native/cpp/CommonCpp/Notifications/BackgroundDataStorage/NotificationsCryptoModule.cpp
147–148 ↗(On Diff #31869)

The naming of NotificationsCryptoModule and CryptoModule is pretty confusing. From usage, it seems like NotificationsCryptoModule is a singleton, whereas CryptoModule is some piece of data we pass around between places...

Can you put up a separate diff that renames CryptoModule? Maybe to something like OlmState?

I may be misunderstanding what's going on here, but it feels to me that the similar names here are misleading.

292 ↗(On Diff #31869)

What exactly is going on here?

Docs seem to indicate that the parameters to make_unique are passed as parameters to the constructor of the parameterized type (CryptoModule) in this case.

But I can't find a copy/move constructor for CryptoModule, so it's confusing to me how CryptoModule's constructor can be called with a parameter of CryptoModule.

314 ↗(On Diff #31869)

Won't this std::move be automatically implied?

316 ↗(On Diff #31869)

Won't this std::move be automatically implied?

native/cpp/CommonCpp/Notifications/BackgroundDataStorage/NotificationsCryptoModule.h
60 ↗(On Diff #31869)

I can't tell what's going on from this return signature.

Instead of returning some sort of pair of some sort of string with a CryptoModuleStateHolder, I think it would be better to return something like a struct with named parameters.

Here's an idea: what about extending CryptoModuleStateHolder to include the decrypted data? We could rename it to something like DecryptionResult. I think that might be a cleaner solution

native/ios/NotificationService/NotificationService.mm
229 ↗(On Diff #31869)

Is this std::move necessary? It seems like the compiler should be able to automatically use a move constructor

native/cpp/CommonCpp/Notifications/BackgroundDataStorage/NotificationsCryptoModule.cpp
292 ↗(On Diff #31869)

If there are no copy constructor declared then the compiler generates the default copy constructor for us: https://en.cppreference.com/w/cpp/language/copy_constructor. This behaviour can be manually disabled but we don't do so in our code. The same goes for move constructor.

314 ↗(On Diff #31869)

We can omit std::move here.

316 ↗(On Diff #31869)

I won't be. cryptoModuleStatePtr is an l-value which must either be copied or moved manually: https://stackoverflow.com/questions/38985127/unique-ptr-in-member-initialization-list. std::unique_ptr has deleted copy constructor so we have to use std::move manually. The code won't compile without this std::move.

native/ios/NotificationService/NotificationService.mm
229 ↗(On Diff #31869)

We have to use std::move since flushState expects std::unique_ptr by value but std::unique_ptr cannot be copied.

native/cpp/CommonCpp/Notifications/BackgroundDataStorage/NotificationsCryptoModule.h
60 ↗(On Diff #31869)

Sounds good. I would however opt for a name like StatefullDecryptResult. It better reflects what actually is stored here since plain DecryptResult sounds like just decrypted data.

Make decrypted data a part of returned structure.

ashoat added inline comments.
native/cpp/CommonCpp/Notifications/BackgroundDataStorage/NotificationsCryptoModule.cpp
315–318 ↗(On Diff #31993)

I don't think it's necessary to first construct statefulDecryptResult on the stack. Can't we directly construct on the heap, by using the main constructor instead of the copy constructor?

147–148 ↗(On Diff #31869)

I don't think this comment ever received a response. Did you put up a diff already? If so, can you please link it? If not, please make sure you either (1) create a diff and link it here before landing, or (2) create a follow-up task and link it here before landing

This revision is now accepted and ready to land.Oct 16 2023, 1:50 PM
native/cpp/CommonCpp/Notifications/BackgroundDataStorage/NotificationsCryptoModule.cpp
315–318 ↗(On Diff #31993)

In current settings we can't do so since StatefulDecryptResult constructor is private. It is available to NotificationsCryptoModule since it is friend class, but std::make_unique<T> method is not friend class of StatefulDecryptResult so it cannot call its constructor. Code with the change suggested above won't compile.

native/cpp/CommonCpp/Notifications/BackgroundDataStorage/NotificationsCryptoModule.cpp
147–148 ↗(On Diff #31869)
native/ios/NotificationService/NotificationService.mm
224 ↗(On Diff #31749)

Based on @marcin's testing above it seems like this is not likely, but it is possible.

Further testing by @marcin revealed that this may happen around 1 out of every 5 times. As such, we're going to revert to the earlier approach where we flush before calling the contentHandler.

  1. Place call to contentHandler at the very end of notification processing.
  2. Rebase before landing.
native/cpp/CommonCpp/Notifications/BackgroundDataStorage/NotificationsCryptoModule.cpp
292 ↗(On Diff #31869)

In the future, when we think about using an implicitly-defined copy constructor, we should make sure it doesn't do anything dangerous. This ended up causing ENG-5354