Details

Reviewers

tomek
ashoat

Commits

rCOMM762261bd4c4f: [server] Migrate database after permission updates

Summary

The diff migrates database to recalculate perimissions after changing default permissions for OPEN_SUBTHREADs.
I put updateRolesAndPermissionsForAllThreads function in thread-permission-updaters, but the function itself is taken from add-edit-thread-detailed-permissions.js migration script introduced in D1979.
As the script did exactly the same thing - recalculated permissions after introducing changes in default permission configuration - and we no longer use seaprate scripts for migrations, I think, it's better to put this function in updaters so it could be reused in the future if there is a need.

Test Plan

Migration is performed automatically when server starts. After the migration the issue: https://linear.app/comm/issue/ENG-1004/error-when-adding-user-that-is-not-in-parent-thread-to-open-subchannel no longer appears in any thread.

Diff Detail

Repository

rCOMM Comm

Branch

jacek/add-user-issue

Lint

No Lint Coverage

Unit

No Test Coverage

Event Timeline

• jacek created this revision.May 18 2022, 10:47 AM

Herald added subscribers: • abosh, atul, • adrian. · View Herald TranscriptMay 18 2022, 10:47 AM

Harbormaster completed remote builds in B9260: Diff 12857.May 18 2022, 10:55 AM

• jacek requested review of this revision.May 18 2022, 10:55 AM

• jacek added a parent revision: D4069: [server] [fix] Unify permissions for `OPEN_SUBTHREAD` and `SECRET_SUBTHREAD`.May 18 2022, 1:28 PM

ashoat requested changes to this revision.May 19 2022, 3:00 AM

ashoat added inline comments.

keyserver/src/updaters/thread-permission-updaters.js
1227–1233	I'm worried that we can't do this in batches because we need to do parents before children, since parents' permissions can affect children. Are you sure we're safe doing in batches? Is there a downside to just doing `recalculateAllThreadPermissions`?

This revision now requires changes to proceed.May 19 2022, 3:00 AM

• jacek added inline comments.May 20 2022, 9:56 AM

keyserver/src/updaters/thread-permission-updaters.js
1227–1233	Yes. I suppose you are right, that doing batches may not be safe. The similar code was originally used in D1029, but the sequence of updating PRIVATE (parent threads) and SIDEBAR (child threads) was guaranteed there. The reason I didn't use `recalculateAllThreadPermissions` is because it doesn't execute `updateRoles` function (which calls `getRolePermissionBlobs`). My test confirmed, that it doesn't solve the permissions in our case and original issue still persists for existing threads.

Ah, makes sense. Sounds like we'll need a new function, but let's avoid doing things in batches.

Process threads sorted by depth to avoid processing parent threads before children

Harbormaster completed remote builds in B9389: Diff 13007.May 23 2022, 1:41 PM

• jacek planned changes to this revision.May 23 2022, 1:45 PM

ashoat requested changes to this revision.May 23 2022, 3:54 PM

ashoat added inline comments.

keyserver/src/database/migrations.js
91–93 ↗	(On Diff #13007)	This can be simplified
keyserver/src/updaters/thread-permission-updaters.js
1223 ↗	(On Diff #13007)	I wonder if there is an easy way to refactor this to use your new depth-based approach. No need to spend a lot of time on this, but it seems like the only difference is `updateRoles`? So I wonder if we could have a generic version of `updateRolesAndPermissionsForAllThreads` that takes a callback that returns a `Promise`. And then `recalculateAllThreadPermissions` would call it without the callback, and your function would call it with `updateRoles` as the callback
1242–1285 ↗	(On Diff #13007)	This looks great!! Great thinking @def-au1t and @palys-swm to use the new `depth` column

add delay before executing transaction to avoid restarting migration in dev environment

• jacek planned changes to this revision.May 24 2022, 11:53 AM

Harbormaster completed remote builds in B9432: Diff 13058.May 24 2022, 11:56 AM

• jacek requested review of this revision.May 27 2022, 1:13 PM

• jacek added inline comments.

keyserver/src/updaters/thread-permission-updaters.js
1223 ↗	(On Diff #13007)	Maybe it would be better, instead of passing the callback, just pass flag as parameter like `updateRoles: boolean` in `updateRolesAndPermissionsForAllThreads`? The `updateRoles` takes three parameters, so the callback inside the generic function would also need to take these exact parameters, what would make the function not much generic. I think it should be out of the scope of this diff, and I'll create a separate task for investigating depth-based approach in existing migration functions.

• jacek added inline comments.May 27 2022, 1:23 PM

keyserver/src/updaters/thread-permission-updaters.js
1223 ↗	(On Diff #13007)	Created task for this: https://linear.app/comm/issue/ENG-1198/refactor-permissions-updates-using-depth-based-approach

ashoat accepted this revision.May 27 2022, 2:33 PM

ashoat added inline comments.

keyserver/src/database/migrations.js
106 ↗	(On Diff #13058)	Can we wrap this in a condition to check dev mode? if (process.env.NODE_ENV === 'development') {
keyserver/src/updaters/thread-permission-updaters.js
1223 ↗	(On Diff #13007)	Makes sense that we would want to avoid the callback, but I think having a `boolean` parameter isn't great for readability... maybe we can make an enum `undefined \| 'update_role_permissions'`? And thanks for creating a Backlog task... that makes sense to me.

This revision is now accepted and ready to land.May 27 2022, 2:33 PM

add condition to delay migrations only in dev mode

Harbormaster completed remote builds in B9534: Diff 13220.May 31 2022, 8:29 AM

While performing last tests just before landing, I noticed, that it still sometimes happens (although less frequently than before) to deadlock database during the migration - in both dev and production modes. It seems to happen on different queries and is non-deterministic (I perform migration from the same database state from backup).

I'll hold-off landing this diff, until I investigate the issue further.

How bad is the deadlock? dbQuery is set up to retry a query twice (for a total of 3 tries) if it sees a deadlock.

In D4071#117511, @ashoat wrote:

How bad is the deadlock? dbQuery is set up to retry a query twice (for a total of 3 tries) if it sees a deadlock.

Yes, it is, but interesting thing here is that dbQuery restarts query when ER_LOCK_DEADLOCK is thrown, but in our case database throws ER_LOCK_WAIT_TIMEOUT error, what prevents transaction from being restarted and app immediately crashes due to failed migration.
I'm not sure what are the differences between these two kinds of error, but it seems, (following https://dev.mysql.com/doc/mysql-errors/5.7/en/server-error-reference.html), that deadlock is detected automatically by database system and lock_timeout is not (so it takes some time before it's thrown - by default 50s).

We can consider if throwing such error with lock timeout should also trigger transaction restart in dbQuery

Right now, locks still occur in my case with less than 50% tries on both prod and dev - even when doing updates with batch size of 1. I'm experimenting with wrapping each batch into a separate transaction instead of running it in one for migration, ~~and it seems to help a lot for now~~, but I'm still testing it.

That's very strange... I haven't seen that sort of deadlock in the past when running migrations like this. I seem to recall we were getting some deadlocks due to nodemon – have you tried running this via yarn prod instead of yarn dev to confirm nodemon is not at fault?

In D4071#118719, @ashoat wrote:

That's very strange... I haven't seen that sort of deadlock in the past when running migrations like this. I seem to recall we were getting some deadlocks due to nodemon – have you tried running this via yarn prod instead of yarn dev to confirm nodemon is not at fault?

Yes, I tried. It seems not to be related to nodemon, as deadlock's happens in both yarn prod and yarn dev

• jacek mentioned this in D4514: [server] Use single connection with DB for migrations.Jul 13 2022, 1:55 PM

• jacek added a parent revision: D4514: [server] Use single connection with DB for migrations.Jul 13 2022, 3:06 PM

rebase

This revision is now accepted and ready to land.Jul 13 2022, 3:07 PM

Harbormaster completed remote builds in B10489: Diff 14436.Jul 13 2022, 3:19 PM

Closed by commit rCOMM762261bd4c4f: [server] Migrate database after permission updates. · Explain WhyJul 15 2022, 10:08 AM

This revision was automatically updated to reflect the committed changes.

• jacek mentioned this in rCOMMdd58826093a1: [server] Use single connection with DB for migrations.

• jacek added a commit: rCOMM762261bd4c4f: [server] Migrate database after permission updates.