Page MenuHomePhabricator

[native] introduce atomic flag that indicates clearing database process will execute
ClosedPublic

Authored by kamil on Nov 9 2022, 2:50 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sat, Jan 4, 6:02 PM
Unknown Object (File)
Mon, Dec 16, 8:33 PM
Unknown Object (File)
Mon, Dec 16, 8:33 PM
Unknown Object (File)
Mon, Dec 16, 8:33 PM
Unknown Object (File)
Mon, Dec 16, 8:33 PM
Unknown Object (File)
Mon, Dec 16, 8:33 PM
Unknown Object (File)
Mon, Dec 16, 8:33 PM
Unknown Object (File)
Mon, Dec 16, 8:32 PM

Details

Summary

context: ENG-2139

The flag will allow indicating (while executing task) that the task should be canceled (because e.g. clearing the database was scheduled).

Test Plan
  1. Build android app
  2. Build iOS app
  3. check on logout (by adding logs) that the flag was properly set and unset after calling cleatSensitiveData.

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

kamil held this revision as a draft.
kamil edited the test plan for this revision. (Show Details)
kamil published this revision for review.Nov 9 2022, 5:30 AM

Unrelated to this diff, but wondering: why does Android initiate multithreadingEnabled(true) but iOS does multithreadingEnabled(false)?

Unrelated to this diff, but wondering: why does Android initiate multithreadingEnabled(true) but iOS does multithreadingEnabled(false)?

I had discussion about this with @tomek at the initial stage of work on GlobalDBSingleton. Before introduction of GlobalDBSingleton application created a dedicated database thread when CommCoreModule was created - when application foregrounded and started JavaScript thread. GlobalDBSingleton is meant to provide single threaded access to database not only when it is foregrounded but also in the background in response to notifications. One idea was to create dedicated database thread from the beginning of application existence. This would mean that if the application is started by the system to respond to notification we would create new thread (dedicated database thread). We didn't see it to be an issue on Android, however I was told that it might be an issue on iOS. Apple might detect that we create additional thread (use more resources) to respond to notification and send them less frequently. Therefore we decided, that GlobalDBSingleton will create dedicated database thread from the beginning on Android and only after JavaScript starts (CommCoreModule is created) on iOS. On iOS before JavaScript starts every database task is asynchronously dispatched on the application's main thread queue (main thread will execute it).

native/cpp/CommonCpp/NativeModules/CommCoreModule.cpp
1069 ↗(On Diff #18254)

What about DatabaseManager::getQueryExecutor().clearSensitiveData(); failure? Shouldn't we also set tasksCancelled flag to false again? Currently if DatabaseManager::getQueryExecutor().clearSensitiveData() fails we catch the error in JavaScript and kill the app. So when the app is restarted, the GlobalDBSingleton class is loaded again with default false value. Nevertheless the code on the C++ side should be agnostic of what is happening in JavaScript. Do we have a reason to keep this flag to true in case of clearSensitiveData failure?

Thanks for explaining, @marcin!

native/cpp/CommonCpp/NativeModules/CommCoreModule.cpp
1069 ↗(On Diff #18254)

It looks like tasksCancelled is indeed set to false in the case of clearSensitiveData() failure, no? The try-catch will catch the exception, and then setTasksCancelled(false) will run. Let me know if I'm missing something

marcin added a reviewer: tomek.
marcin added inline comments.
native/cpp/CommonCpp/NativeModules/CommCoreModule.cpp
1069 ↗(On Diff #18254)

Actually you are right - I don't know why didn't notice it. My oversight.

Haven't worked with std::atomic before so will need to do some reading up before reviewing this... but based on the C++ book I've been using as a reference:

This chapter doesn’t discuss std::atomic or how to devise your own lock- free solutions in detail, because it’s incredibly difficult to do correctly and is best left to experts. However, in simple situations, such as in Listing 19-10, you can employ a std::atomic to make sure that the increment or decrement operations cannot be divided.

Unless you have a very simple concurrent access problem, such as the one in this section, you really shouldn’t try to implement lock-free solutions on your own. Refer to the Boost Lockfree library for high-quality, thoroughly tested lock-free containers. As always, you must decide whether a lock-based or lock-free implementation is optimal.

So we should probably be very careful with whatever we do here.

Is there a reason we can't use std::mutex and std::lock_guard instead? That's a pattern we use elsewhere in the codebase and more importantly std::mutex + lock-based synchronization is something I understand in much more depth than lock-free concurrency...

atul requested changes to this revision.Nov 14 2022, 5:02 PM

Can you explain how/where/when the value of tasksCancelled will be consumed?

native/cpp/CommonCpp/NativeModules/CommCoreModule.cpp
1065–1073 ↗(On Diff #18374)

There's no exclusive access to tasksCancelled between when line 1065 and line 1073 execute. Is that fine? I'd imagine we want some sort of std::lock_guard to make sure setTasksCancelled can't be accessed by anything else in between?

It's kind of hard to know what the right design is here since I don't know how you intend to "consume" the value of tasksCancelled.

This revision now requires changes to proceed.Nov 14 2022, 5:02 PM
In D5576#166868, @atul wrote:

Is there a reason we can't use std::mutex and std::lock_guard instead? That's a pattern we use elsewhere in the codebase and more importantly std::mutex + lock-based synchronization is something I understand in much more depth than lock-free concurrency...

Honestly, I think that mutexes are more complicated and error-prone than atomic variables. Designing a lock-free solution is indeed complicated, but in our case we don't need synchronization, but only visibility so it's a lot easier. Using mutexes will be huge overkill when the only thing we need is for one thread to see an update that other thread might have done. So basically, our goal is to avoid data races https://en.cppreference.com/w/cpp/language/memory_model.

native/cpp/CommonCpp/NativeModules/CommCoreModule.cpp
1065–1073 ↗(On Diff #18374)

setTaskCancelled performs assignment to atomic variable. taskCancelled` is consumed in the same fashion - an atomic read. Therefore, as far as I understand there is no need to use mutexes or other forms of locking.

In D5576#166970, @tomek wrote:
In D5576#166868, @atul wrote:

Is there a reason we can't use std::mutex and std::lock_guard instead? That's a pattern we use elsewhere in the codebase and more importantly std::mutex + lock-based synchronization is something I understand in much more depth than lock-free concurrency...

Honestly, I think that mutexes are more complicated and error-prone than atomic variables. Designing a lock-free solution is indeed complicated, but in our case we don't need synchronization, but only visibility so it's a lot easier. Using mutexes will be huge overkill when the only thing we need is for one thread to see an update that other thread might have done. So basically, our goal is to avoid data races https://en.cppreference.com/w/cpp/language/memory_model.

@tomek, you're not wrong in a general sense... yes, mutexes are complicated. But did you read @atul's inline comment, though? It seems pretty clear that he is right, and the current design is simply insufficient for what we need to do.

native/cpp/CommonCpp/NativeModules/CommCoreModule.cpp
1065–1073 ↗(On Diff #18374)

I think @atul is right here. Imagine if two clearSensitiveData calls are happening simultaneously. The first one might finish and then call setTasksCancelled(false) while the other one is still deleting the database. We either need an atomic int to count the number of entrances into this code block, or we need to do something to guarantee only one entrance at a time (probably better, and what @atul was suggesting)

native/cpp/CommonCpp/NativeModules/CommCoreModule.cpp
1065–1073 ↗(On Diff #18374)

Imagine if two clearSensitiveData calls are happening simultaneously.

I might be missing something, but for me it doesn't sound possible. We're scheduling the tasks on MPMCQueue and execute them one-by-one (using WorkerThread), so they shouldn't happen at the same time.

But still, there's a bug here: we should've called GlobalDBSingleton::instance.setTasksCancelled(true); inside the scheduled task.

native/cpp/CommonCpp/NativeModules/CommCoreModule.cpp
1065–1073 ↗(On Diff #18374)

Ah, good point about the WorkerThread... you're right, sorry I missed that!! And great catch on that bug

set tasksCancelled to false after task execution

I've discussed this change with @tomek but sharing what we established.

@atul and @ashoat fairly pointed problem with the previous solution, here is the vulnerable scenario:

  1. there are some tasks in the queue scheduled previously
  2. two clearSensitiveDatas are scheduled simultaneously (note that there is no possibility to execute those at the same time but scheduling at the same time is possible), and flag tasksCancelled is set to true because setting flag takes place before actual task execution
  3. tasks that were in the queue are canceled
  4. first clearSensitiveData execute and set tasksCancelled to false
  5. second clearSensitiveData execute but the flag is set to false which means another tasks can be added to a queue which should be avoided.

to avoid this the flag is not set to false after task execution directly but setting the flag is scheduled as another non-cancellable task which means it will set the flag after execution of second`clearSensitiveData` in the previous example.

But still, there's a bug here: we should've called GlobalDBSingleton::instance.setTasksCancelled(true); inside the scheduled task.

we don not consider this as bug, what is more it will cause that tasks already scheduled in queue will not be cancelled. We can add this additionally but I don't think there is a need to do it.

Thanks for explaining!

This revision is now accepted and ready to land.Nov 16 2022, 7:55 PM