Page MenuHomePhabricator

[server] Migrate database after permission updates
ClosedPublic

Authored by jacek on May 18 2022, 3:47 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Nov 5, 9:59 PM
Unknown Object (File)
Tue, Nov 5, 9:59 PM
Unknown Object (File)
Tue, Nov 5, 9:59 PM
Unknown Object (File)
Tue, Nov 5, 9:58 PM
Unknown Object (File)
Tue, Nov 5, 9:58 PM
Unknown Object (File)
Tue, Nov 5, 9:58 PM
Unknown Object (File)
Tue, Nov 5, 11:25 AM
Unknown Object (File)
Sat, Nov 2, 9:48 AM

Details

Summary

The diff migrates database to recalculate perimissions after changing default permissions for OPEN_SUBTHREADs.
I put updateRolesAndPermissionsForAllThreads function in thread-permission-updaters, but the function itself is taken from add-edit-thread-detailed-permissions.js migration script introduced in D1979.
As the script did exactly the same thing - recalculated permissions after introducing changes in default permission configuration - and we no longer use seaprate scripts for migrations, I think, it's better to put this function in updaters so it could be reused in the future if there is a need.

Test Plan

Migration is performed automatically when server starts. After the migration the issue: https://linear.app/comm/issue/ENG-1004/error-when-adding-user-that-is-not-in-parent-thread-to-open-subchannel no longer appears in any thread.

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

ashoat requested changes to this revision.May 18 2022, 8:00 PM
ashoat added inline comments.
keyserver/src/updaters/thread-permission-updaters.js
1227–1233 ↗(On Diff #12857)

I'm worried that we can't do this in batches because we need to do parents before children, since parents' permissions can affect children. Are you sure we're safe doing in batches? Is there a downside to just doing recalculateAllThreadPermissions?

This revision now requires changes to proceed.May 18 2022, 8:00 PM
keyserver/src/updaters/thread-permission-updaters.js
1227–1233 ↗(On Diff #12857)

Yes. I suppose you are right, that doing batches may not be safe. The similar code was originally used in D1029, but the sequence of updating PRIVATE (parent threads) and SIDEBAR (child threads) was guaranteed there.

The reason I didn't use recalculateAllThreadPermissions is because it doesn't execute updateRoles function (which calls getRolePermissionBlobs). My test confirmed, that it doesn't solve the permissions in our case and original issue still persists for existing threads.

Ah, makes sense. Sounds like we'll need a new function, but let's avoid doing things in batches.

Process threads sorted by depth to avoid processing parent threads before children

ashoat requested changes to this revision.May 23 2022, 8:54 AM
ashoat added inline comments.
keyserver/src/database/migrations.js
91–93 ↗(On Diff #13007)

This can be simplified

keyserver/src/updaters/thread-permission-updaters.js
1223 ↗(On Diff #13007)

I wonder if there is an easy way to refactor this to use your new depth-based approach. No need to spend a lot of time on this, but it seems like the only difference is updateRoles? So I wonder if we could have a generic version of updateRolesAndPermissionsForAllThreads that takes a callback that returns a Promise. And then recalculateAllThreadPermissions would call it without the callback, and your function would call it with updateRoles as the callback

1242–1285 ↗(On Diff #13007)

This looks great!! Great thinking @def-au1t and @palys-swm to use the new depth column

add delay before executing transaction to avoid restarting migration in dev environment

jacek added inline comments.
keyserver/src/updaters/thread-permission-updaters.js
1223 ↗(On Diff #13007)

Maybe it would be better, instead of passing the callback, just pass flag as parameter like updateRoles: boolean in updateRolesAndPermissionsForAllThreads? The updateRoles takes three parameters, so the callback inside the generic function would also need to take these exact parameters, what would make the function not much generic.

I think it should be out of the scope of this diff, and I'll create a separate task for investigating depth-based approach in existing migration functions.

keyserver/src/updaters/thread-permission-updaters.js
1223 ↗(On Diff #13007)
ashoat added inline comments.
keyserver/src/database/migrations.js
106 ↗(On Diff #13058)

Can we wrap this in a condition to check dev mode?

if (process.env.NODE_ENV === 'development') {
keyserver/src/updaters/thread-permission-updaters.js
1223 ↗(On Diff #13007)

Makes sense that we would want to avoid the callback, but I think having a boolean parameter isn't great for readability... maybe we can make an enum undefined | 'update_role_permissions'?

And thanks for creating a Backlog task... that makes sense to me.

This revision is now accepted and ready to land.May 27 2022, 7:33 AM

add condition to delay migrations only in dev mode

While performing last tests just before landing, I noticed, that it still sometimes happens (although less frequently than before) to deadlock database during the migration - in both dev and production modes. It seems to happen on different queries and is non-deterministic (I perform migration from the same database state from backup).

I'll hold-off landing this diff, until I investigate the issue further.

How bad is the deadlock? dbQuery is set up to retry a query twice (for a total of 3 tries) if it sees a deadlock.

How bad is the deadlock? dbQuery is set up to retry a query twice (for a total of 3 tries) if it sees a deadlock.

Yes, it is, but interesting thing here is that dbQuery restarts query when ER_LOCK_DEADLOCK is thrown, but in our case database throws ER_LOCK_WAIT_TIMEOUT error, what prevents transaction from being restarted and app immediately crashes due to failed migration.
I'm not sure what are the differences between these two kinds of error, but it seems, (following https://dev.mysql.com/doc/mysql-errors/5.7/en/server-error-reference.html), that deadlock is detected automatically by database system and lock_timeout is not (so it takes some time before it's thrown - by default 50s).

We can consider if throwing such error with lock timeout should also trigger transaction restart in dbQuery

Right now, locks still occur in my case with less than 50% tries on both prod and dev - even when doing updates with batch size of 1. I'm experimenting with wrapping each batch into a separate transaction instead of running it in one for migration, and it seems to help a lot for now, but I'm still testing it.

That's very strange... I haven't seen that sort of deadlock in the past when running migrations like this. I seem to recall we were getting some deadlocks due to nodemon – have you tried running this via yarn prod instead of yarn dev to confirm nodemon is not at fault?

That's very strange... I haven't seen that sort of deadlock in the past when running migrations like this. I seem to recall we were getting some deadlocks due to nodemon – have you tried running this via yarn prod instead of yarn dev to confirm nodemon is not at fault?

Yes, I tried. It seems not to be related to nodemon, as deadlock's happens in both yarn prod and yarn dev

This revision is now accepted and ready to land.Jul 13 2022, 8:07 AM