The diff migrates database to recalculate perimissions after changing default permissions for OPEN_SUBTHREADs.
I put updateRolesAndPermissionsForAllThreads function in thread-permission-updaters, but the function itself is taken from add-edit-thread-detailed-permissions.js migration script introduced in D1979.
As the script did exactly the same thing - recalculated permissions after introducing changes in default permission configuration - and we no longer use seaprate scripts for migrations, I think, it's better to put this function in updaters so it could be reused in the future if there is a need.
Details
Migration is performed automatically when server starts. After the migration the issue: https://linear.app/comm/issue/ENG-1004/error-when-adding-user-that-is-not-in-parent-thread-to-open-subchannel no longer appears in any thread.
Diff Detail
- Repository
- rCOMM Comm
- Branch
- jacek/add-user-issue
- Lint
No Lint Coverage - Unit
No Test Coverage
Event Timeline
keyserver/src/updaters/thread-permission-updaters.js | ||
---|---|---|
1227–1233 | I'm worried that we can't do this in batches because we need to do parents before children, since parents' permissions can affect children. Are you sure we're safe doing in batches? Is there a downside to just doing recalculateAllThreadPermissions? |
keyserver/src/updaters/thread-permission-updaters.js | ||
---|---|---|
1227–1233 | Yes. I suppose you are right, that doing batches may not be safe. The similar code was originally used in D1029, but the sequence of updating PRIVATE (parent threads) and SIDEBAR (child threads) was guaranteed there. The reason I didn't use recalculateAllThreadPermissions is because it doesn't execute updateRoles function (which calls getRolePermissionBlobs). My test confirmed, that it doesn't solve the permissions in our case and original issue still persists for existing threads. |
Ah, makes sense. Sounds like we'll need a new function, but let's avoid doing things in batches.
keyserver/src/database/migrations.js | ||
---|---|---|
91–93 ↗ | (On Diff #13007) | This can be simplified |
keyserver/src/updaters/thread-permission-updaters.js | ||
1223 ↗ | (On Diff #13007) | I wonder if there is an easy way to refactor this to use your new depth-based approach. No need to spend a lot of time on this, but it seems like the only difference is updateRoles? So I wonder if we could have a generic version of updateRolesAndPermissionsForAllThreads that takes a callback that returns a Promise. And then recalculateAllThreadPermissions would call it without the callback, and your function would call it with updateRoles as the callback |
1242–1285 ↗ | (On Diff #13007) | This looks great!! Great thinking @def-au1t and @palys-swm to use the new depth column |
add delay before executing transaction to avoid restarting migration in dev environment
keyserver/src/updaters/thread-permission-updaters.js | ||
---|---|---|
1223 ↗ | (On Diff #13007) | Maybe it would be better, instead of passing the callback, just pass flag as parameter like updateRoles: boolean in updateRolesAndPermissionsForAllThreads? The updateRoles takes three parameters, so the callback inside the generic function would also need to take these exact parameters, what would make the function not much generic. I think it should be out of the scope of this diff, and I'll create a separate task for investigating depth-based approach in existing migration functions. |
keyserver/src/updaters/thread-permission-updaters.js | ||
---|---|---|
1223 ↗ | (On Diff #13007) |
keyserver/src/database/migrations.js | ||
---|---|---|
106 ↗ | (On Diff #13058) | Can we wrap this in a condition to check dev mode? if (process.env.NODE_ENV === 'development') { |
keyserver/src/updaters/thread-permission-updaters.js | ||
1223 ↗ | (On Diff #13007) | Makes sense that we would want to avoid the callback, but I think having a boolean parameter isn't great for readability... maybe we can make an enum undefined | 'update_role_permissions'? And thanks for creating a Backlog task... that makes sense to me. |
While performing last tests just before landing, I noticed, that it still sometimes happens (although less frequently than before) to deadlock database during the migration - in both dev and production modes. It seems to happen on different queries and is non-deterministic (I perform migration from the same database state from backup).
I'll hold-off landing this diff, until I investigate the issue further.
How bad is the deadlock? dbQuery is set up to retry a query twice (for a total of 3 tries) if it sees a deadlock.
Yes, it is, but interesting thing here is that dbQuery restarts query when ER_LOCK_DEADLOCK is thrown, but in our case database throws ER_LOCK_WAIT_TIMEOUT error, what prevents transaction from being restarted and app immediately crashes due to failed migration.
I'm not sure what are the differences between these two kinds of error, but it seems, (following https://dev.mysql.com/doc/mysql-errors/5.7/en/server-error-reference.html), that deadlock is detected automatically by database system and lock_timeout is not (so it takes some time before it's thrown - by default 50s).
We can consider if throwing such error with lock timeout should also trigger transaction restart in dbQuery
Right now, locks still occur in my case with less than 50% tries on both prod and dev - even when doing updates with batch size of 1. I'm experimenting with wrapping each batch into a separate transaction instead of running it in one for migration, and it seems to help a lot for now, but I'm still testing it.
That's very strange... I haven't seen that sort of deadlock in the past when running migrations like this. I seem to recall we were getting some deadlocks due to nodemon – have you tried running this via yarn prod instead of yarn dev to confirm nodemon is not at fault?
Yes, I tried. It seems not to be related to nodemon, as deadlock's happens in both yarn prod and yarn dev