Page MenuHomePhabricator

[native/sqlite] refactor code to perform migration inside one transaction
ClosedPublic

Authored by kamil on Oct 17 2022, 5:57 AM.
Tags
None
Referenced Files
F3246491: D5377.id17951.diff
Fri, Nov 15, 12:03 AM
F3246466: D5377.id17939.diff
Thu, Nov 14, 11:56 PM
F3246465: D5377.id18417.diff
Thu, Nov 14, 11:56 PM
F3246458: D5377.id17670.diff
Thu, Nov 14, 11:55 PM
F3246441: D5377.id17607.diff
Thu, Nov 14, 11:48 PM
F3246419: D5377.id17943.diff
Thu, Nov 14, 11:37 PM
F3246402: D5377.id18067.diff
Thu, Nov 14, 11:31 PM
F3246395: D5377.id17945.diff
Thu, Nov 14, 11:30 PM

Details

Summary

We want to perform getting version, migration, and updating version inside one transaction.

Unfortunately, migration 22 which is setting journal mode can not be done inside the transaction, because it will result in an error: cannot change into wal mode from within a transaction (1). That is why first we need to check the version, perform if shouldBeInTransaction is false, then start the transaction and again check the version (If the app will run into issues like those described in previous links in this time another thread could already updated the version).
If another thread will do it within getting the version and performing migration (not in the transaction) we do not run into issues because setting journal mode is an idempotent operation.

Note: whenever code will face issues right now there are break and return statements but it will be handled in a different way after implementing ENG-1974

Test Plan

Perform migrations (decrement version or logout user) and check if

  • DB schema has the proper structure
  • journal_mode is set properly to value wal

Text case described here:
https://phab.comm.dev/D5377#161885

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
kamil edited the test plan for this revision. (Show Details)
kamil added a reviewer: atul.
kamil edited the test plan for this revision. (Show Details)
kamil published this revision for review.Oct 17 2022, 6:41 AM
atul requested changes to this revision.Oct 19 2022, 1:33 PM

I think this needs to be cleaned up some more before it's ready for review

native/cpp/CommonCpp/DatabaseManagers/SQLiteQueryExecutor.cpp
580–582

Confused what's happening here.

This revision now requires changes to proceed.Oct 19 2022, 1:33 PM
In D5377#159644, @atul wrote:

I think this needs to be cleaned up some more before it's ready for review

Looks like while making diff I chose the wrong line to commit... I was sure it wasn't there, what is more, test case will fail with this. Anyway sorry. for taking your time and confusing you, now should work.

If there is still something confusing and the summary is not decent let me know.

atul requested changes to this revision.Oct 20 2022, 7:36 AM

Have some questions inline... but high level it looks like we're trying to do multiple things in this diff

  1. We want to ROLLBACK when a migration fails.
  2. We want to ROLLBACK in the idx <= db_version_inside_transaction scenario.

Each of these can be addressed in separate diff to make things easier to review.

I also think we can achieve both of those goals without having to duplicate things for the transaction/non-transaction scenarios using the if (shouldBeInTransaction) { pattern to wrap transaction related queries

Anyway sorry. for taking your time and confusing you

No worries, this is just an area of the codebase where we need to be extra careful/paranoid to make sure things don't break.

native/cpp/CommonCpp/DatabaseManagers/SQLiteQueryExecutor.cpp
565–570 ↗(On Diff #17670)

Looks like we're essentially copy/pasting this from line 581-587.

Could we instead do something like

auto rc = applyMigration(db);
if (!rc) {
  migration_msg << "migration " << idx << " failed." << std::endl;
  Logger::log(migration_msg.str());
  if (shouldBeInTransaction) {
    sqlite3_exec(db, "ROLLBACK;", nullptr, nullptr, nullptr);
  }
  break;
}

Basically keep the pattern from before where anything transaction related was wrapped in if (shouldBeInTransaction) conditional?

574–577 ↗(On Diff #17670)

In what scenario is it possible for idx <= db_version_inside_transaction to be true here?

This revision now requires changes to proceed.Oct 20 2022, 7:36 AM
  1. We want to ROLLBACK when a migration fails.

I think we could just do something like this: https://phab.comm.dev/D5433?

In D5377#160227, @atul wrote:
  1. We want to ROLLBACK when a migration fails.

I think we could just do something like this: https://phab.comm.dev/D5433?

That only partially fixes the problem, the problem in the previous version of code that on error we do not rollback transaction.

The whole point of this diff is to

  1. Get the version
  2. Perform migration
  3. Update the version

in one transaction, the solution you suggested wraps only point 2 in the transaction.

More details here: https://linear.app/comm/issue/ENG-1910#comment-8e8a1fe9

The code duplication is caused only by the fact that we have to additionally support migrations that are not capable of being performed inside transactions (like setting journal_mode), but if you have any better idea on how to handle this I will appreciate any suggestion.

Answering only here because I think it covers also inline comments.

kamil requested review of this revision.Oct 20 2022, 8:44 AM

Does the following stack of two commits achieve both goals D5440 D5441?

atul requested changes to this revision.Oct 20 2022, 9:02 AM
This revision now requires changes to proceed.Oct 20 2022, 9:02 AM
In D5377#160412, @atul wrote:

Does the following stack of two commits achieve both goals D5440 D5441?

I think so!

The only problem is that when there is a migration that shouldn't be in a transaction checking and updating version are not in one transaction - in my solution they are.
The whole problem described in the issue I linked in the summary is probably some other thread decreased the database version and with @tomek we thought that this should reduce the risk of this scenario (I know we have different mechanisms right now like GlobalDBSingleton but we wanted to make this code as safe as possible without looking at other code dependencies).

If you @atul and @tomek think it's safe in the way Atul proposed I am happy to implement this one - much shorter and easier to understand.

kamil requested review of this revision.Oct 21 2022, 1:09 AM
kamil added a reviewer: tomek.
marcin requested changes to this revision.Oct 21 2022, 1:53 AM

(If the app will run into issues like those described in previous links in this time another thread could already updated the version).
If another thread will do it within getting the version and performing migration (not in the transaction) we do not run into issues because setting journal mode is an idempotent operation.

Such scenario is currently impossible. Different threads access database via GlobalDBSingleton. They put tasks (in the form of C++ lambdas) to the queue, and dedicated database thread executes them in FIFO manner (by consuming queue). This way tasks scheduled to GlobalDBSingleton are executed atomically. SQLiteQueryExecutor is thread_local, so every thread will instantiate its own SQLiteQueryExecutor (and hence schedule migrate() to be executed on GlobalDBSingleton), but due to atomicity of database tasks it is not possible that some migrations will be performed by one's thread SQLiteQueryExecutor constructor while others by another's.

Those changes look OK, but we must be aware that they do not resolve potential issue but rather prepare code to a situation if we enable multithreaded access to SQLite and resign from GlobalDBSingleton at all.
Requesting changes to make sure this comment is read and addressed.

This revision now requires changes to proceed.Oct 21 2022, 1:53 AM
kamil requested review of this revision.Oct 21 2022, 2:03 AM

(If the app will run into issues like those described in previous links in this time another thread could already updated the version).
If another thread will do it within getting the version and performing migration (not in the transaction) we do not run into issues because setting journal mode is an idempotent operation.

Such scenario is currently impossible. Different threads access database via GlobalDBSingleton. They put tasks (in the form of C++ lambdas) to the queue, and dedicated database thread executes them in FIFO manner (by consuming queue). This way tasks scheduled to GlobalDBSingleton are executed atomically. SQLiteQueryExecutor is thread_local, so every thread will instantiate its own SQLiteQueryExecutor (and hence schedule migrate() to be executed on GlobalDBSingleton), but due to atomicity of database tasks it is not possible that some migrations will be performed by one's thread SQLiteQueryExecutor constructor while others by another's.

Those changes look OK, but we must be aware that they do not resolve potential issue but rather prepare code to a situation if we enable multithreaded access to SQLite and resign from GlobalDBSingleton at all.
Requesting changes to make sure this comment is read and addressed.

Get your point that is currently impossible but as I said in a previous comment:

with @tomek we thought that this should reduce the risk of this scenario (I know we have different mechanisms right now like GlobalDBSingleton but we wanted to make this code as safe as possible without looking at other code dependencies).

I know that is not a break through, I just follow what was approved while fixing a recent bug (ENG-1910).
Re-requesting a review to put this back in @tomek and @atul's queue.

marcin requested changes to this revision.Oct 24 2022, 6:42 AM

Perform migrations (decrement version or logout user) and check if

DB schema has the proper structure
journal_mode is set properly to value wal

This test plan is not comprehensive enough. We must:

  1. Temporarily change sqlite compilation settings for both iOS and Android so that multithreaded access to sqlite is enabled.
  2. Create several threads that call DatabaseManager::getQueryExecutor() simultaneously outside of GlobalDBSingleton context. Make sure that database is not present yet at the time those constructors are executed and that migrations are actually performed by different threads (add logging which thread performed which migration)
  3. Ensure that produced database schema and content is correct

Only then this differential can be landed.

native/cpp/CommonCpp/DatabaseManagers/SQLiteQueryExecutor.cpp
565–570 ↗(On Diff #17670)

I agree with @atul. This code needs to be refactored so that auto rc = applyMigration(db); line appears only once.

574–577 ↗(On Diff #17670)

This code is intended to work correctly even if in future we enable multithreaded access to the database and many threads at once will call SQLiteQueryExecutor constructor (which calls migrate()). Then two threads might start the same migration simultaneously and if one of them finishes earlier then db_version_inside_transaction will be higher than idx for the other.

This revision now requires changes to proceed.Oct 24 2022, 6:42 AM

Thanks for suggesting this test @marcin, I am a little bit skeptical because in the concurrent environment we will never be 100% sure it works, only if we will have luck we will find something which might not work but here is what I did:

  1. Temporarily change sqlite compilation settings for both iOS and Android so that multithreaded access to SQLite is enabled.

I narrowed this only to iOS where we have multithreaded access because we testing shared code (but I can add implementation for Andriod if you think it's needed later on)

Create several threads that call DatabaseManager::getQueryExecutor() simultaneously outside of GlobalDBSingleton context. Make sure that database is not present yet at the time those constructors are executed and that migrations are actually performed by different threads (add logging which thread performed which migration)

I did this in AppDelegate on app start - it's easier and we are sure that any other getQueryExecutor call will not interfere with test, code:

[self attemptDatabaseInitialization];
NSLog(@"Clear previous db to make all migrations run");
comm::DatabaseManager::getQueryExecutor().clearSensitiveData();

NSLog(@"Starting test process...");
std::vector<comm::WorkerThread*> workers;
for(int i = 0; i < 10; i++) {
 workers.push_back(new comm::WorkerThread("worker-" + std::to_string(i)));
}
NSLog(@"Threads created");
for(auto worker: workers) {
 worker->scheduleTask([=]() {
  comm::DatabaseManager::getQueryExecutor();
 });
}
NSLog(@"Jobs started...");

If you care about logs Logger does the job for us,
2022-10-25 15:41:17.207031+0200 Comm[21320:165165] COMM: db version: 0
the value after the colon (here 165165) is the thread indicator.
I will not attach entire logs because it's a lot of data but I have those saved if anyone will be interested

1. Test the latest master
The problem is that each thread started the migration process at one time, first one started writing to the database and all others failed with the error COMM: Error creating 'drafts' table: database is locked, that being said only the first thread which did not fail, finished the migration process.

2. Test the latest master (with delay)
Assuming that all threads will start at the same time is not the best, so I am adding a delay to migrate() function

std::chrono::milliseconds timespan(1000); 
std::this_thread::sleep_for(timespan);

Now it depends on the run (I tested this multiple times) but usually, most threads fail as in point 1, however, there are some (~2, 3) threads that continue executing because they found empty slots and do not fail, part of logs:

2022-10-25 16:07:50.245311+0200 Comm[22373:184571] COMM: migration 16 succeeded.
2022-10-25 16:07:50.483249+0200 Comm[22373:184630] COMM: migration 17 succeeded.
2022-10-25 16:07:51.252446+0200 Comm[22373:184571] COMM: migration 17 succeeded.
2022-10-25 16:07:51.496106+0200 Comm[22373:184630] COMM: migration 18 succeeded.
2022-10-25 16:07:52.263907+0200 Comm[22373:184571] COMM: migration 18 succeeded.
2022-10-25 16:07:52.506195+0200 Comm[22373:184630] COMM: migration 19 succeeded.
2022-10-25 16:07:53.272700+0200 Comm[22373:184571] COMM: migration 19 succeeded.
2022-10-25 16:07:53.518424+0200 Comm[22373:184630] COMM: migration 20 succeeded.
2022-10-25 16:07:54.285700+0200 Comm[22373:184571] COMM: migration 20 succeeded.

two different threads remembered the initial database version and keep working (it won't fail because we made migrations idempotent previously) - but that's the case we want to avoid.

3. Test this diff (without delay)
Same as in point 1

4. Test this diff (with delay)
Each migration is performed only once and it's done by different threads.

In each test case database schema was created properly.

native/cpp/CommonCpp/DatabaseManagers/SQLiteQueryExecutor.cpp
565–570 ↗(On Diff #17670)

I agree with @atul. This code needs to be refactored so that auto rc = applyMigration(db); line appears only once.

Could you elaborate on why? I know it's code repetition but if will get rid of this we will change the functionality which makes me confused

kamil requested review of this revision.Oct 25 2022, 7:28 AM
kamil edited the test plan for this revision. (Show Details)
native/cpp/CommonCpp/DatabaseManagers/SQLiteQueryExecutor.cpp
565–570 ↗(On Diff #17670)

Not necessarily. I think we can:

get_current_db_version()
if(shouldBeInTransaction){
"BEGIN TRANSACTION"
}
execute_migration();
if(shouldBeInTransaction){
"ROLLBACK" or "END";
} else{
...
}

The above is pseudocode but I hope it is readable enough to convey what me (and probably @atul) mean.

native/cpp/CommonCpp/DatabaseManagers/SQLiteQueryExecutor.cpp
565–570 ↗(On Diff #17670)

I got your point of view, but in the previous comment, I described the differences: https://phab.comm.dev/D5377#160924...
I am okay with doing this that way, I just saying why implemented this differently for the first one because I felt like this comment was only about code repetition, soif you are okay with this after reading my explanation, I am happy to make this code simpler!

native/cpp/CommonCpp/DatabaseManagers/SQLiteQueryExecutor.cpp
565–570 ↗(On Diff #17670)

It's not clear if proposed simplifications will modify the code in a way that it remains correct.

When we have a migration that requires transaction we would like to:

  1. Start transaction
  2. Get db version
  3. End transaction if the migration is already applied
  4. Execute migration
  5. Update db version
  6. End transaction

When the transaction isn't needed:

  1. Get db version
  2. End this migration if it was already applied
  3. Start transaction
  4. Get db version
  5. Update db version
  6. End transaction

The problem is that we have to keep things in order:

  1. Get db version
  2. Execute migration
  3. Update version

And the 2nd point may or may not be in the transaction. At the same time getting db version and updating it should be one transaction. This makes it really hard to avoid repeating code.

It seems like the main issue is that we're calling migration in two different places, each time in mutually exclusive condition, but still the code is fragile.

If we really want to avoid that, a draft of pseudocode might look like this

if(shouldBeInTransaction){
"BEGIN TRANSACTION"
}
get_current_db_version()
if (migration already applied) {
  break or rollback
}
execute_migration();
if (!shouldBeInTransaction) {
  BEGIN TRANSACTION
  get_current_db_version()
}
update db version
END TRANSACTION

more or less... but still I don't expect it to be significantly simpler.

refactor logic to avoid code duplication

I did some changes to avoid calling applyMigration in two (mutually exclusive) conditions like @atul and @marcin prefers.

I implemented @tomek's suggestion from https://phab.comm.dev/D5377#162110.

divide logic into two separate functions

After consulting this with @tomek we decided to divide this into two separate functions - merging this flow into one is possible but the code is hard to understand, so this should improve readability.

tomek requested changes to this revision.Oct 27 2022, 6:56 AM

Looks great! It a lot more readable now! Just a couple of inline comments.

native/cpp/CommonCpp/DatabaseManagers/SQLiteQueryExecutor.cpp
539 ↗(On Diff #17943)

We should avoid using plain enums and instead use enum class which is better typed.

542 ↗(On Diff #17943)

It is a good idea to use consts where possible

554–555 ↗(On Diff #17943)

Shouldn't we check if setting version / committing transaction was successful?

571 ↗(On Diff #17943)

To be super safe, we should open transaction, read version, write it and commit.

This revision now requires changes to proceed.Oct 27 2022, 6:56 AM

use class enum, change type to const reference

native/cpp/CommonCpp/DatabaseManagers/SQLiteQueryExecutor.cpp
554–555 ↗(On Diff #17943)

For sure! But this is just a refactor maintaining old behavior - I will add this when we will have a proper mechanism for error handling, right now even if we discover failure we can do nothing about this.

571 ↗(On Diff #17943)

Yea I made a mistake while committing changes and updated this while you were reviewing the old version, sorry for that.
Is it okay now?

tomek added inline comments.
native/cpp/CommonCpp/DatabaseManagers/SQLiteQueryExecutor.cpp
577 ↗(On Diff #17951)

This value is tricky, because we had to execute the migration code to be here. So for me it's more like a success than not applied. But this is an edge case and what we should return here should depend on what we're going to do with the returned value.
Currently we're using this value to log a message, so not applied is more appropriate.

554–555 ↗(On Diff #17943)

For me it seems like introducing new functions which return MigrationResult should result in this value being properly determined. Handling the return value may be addressed later, but for me it feels like these functions should be correct from the beginning. But up to you.

This revision is now accepted and ready to land.Oct 28 2022, 12:57 AM

check for failure on setting db verison