Page MenuHomePhabricator

[keyserver] stall secondary nodes until database version is equal to or greater than latest wrapped in transaction request blocking migration version.
ClosedPublic

Authored by will on Aug 30 2024, 8:49 AM.
Tags
None
Referenced Files
F3343750: D13213.diff
Fri, Nov 22, 3:58 AM
Unknown Object (File)
Wed, Nov 20, 11:51 AM
Unknown Object (File)
Wed, Nov 20, 11:50 AM
Unknown Object (File)
Sat, Nov 9, 3:17 PM
Unknown Object (File)
Sat, Nov 9, 2:43 PM
Unknown Object (File)
Sat, Nov 9, 8:53 AM
Unknown Object (File)
Sat, Nov 9, 8:42 AM
Unknown Object (File)
Sat, Nov 9, 8:33 AM
Subscribers

Details

Summary

This blocks the secondary nodes from accepting any other traffic other than health checks while the current dbVersion is less than the latest migration specifying wrap_in_transaction_and_block_requests.

Depends on D13212

Test Plan

I created two migrations and a new keyserver docker image.

Both migrations included await sleep. The first migration specified wrap_in_transaction_and_block_requests and the second migration specified run_simultaneously_with_requests.

While the first migration was running (and the primary node was unavailable due to running the migration), only health checks were available on the load balancer, meaning that the secondary nodes were only accepting health check traffic.

I also console logged to ensure the loop was running. On the second migration however, all endpoints were available for secondary nodes.

Diff Detail

Repository
rCOMM Comm
Branch
optional_migration_takedowns
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Aug 30 2024, 8:59 AM
Harbormaster failed remote builds in B31373: Diff 43817!
will requested review of this revision.Aug 30 2024, 9:56 AM

In D13172, you had to stop the Express server after starting it, which forced you to introduce a new dependency (stoppable). Why was that not necessary here?

keyserver/src/keyserver.js
211–216

You could probably dedup lines 211 and 215 if you use a do-while loop

This revision is now accepted and ready to land.Sep 4 2024, 11:08 PM

In D13172, you had to stop the Express server after starting it, which forced you to introduce a new dependency (stoppable). Why was that not necessary here?

In this diff, we don't use two express servers. Express servers start with only health checks. Endpoints are dynamically added after we confirm the migration has occurred.

In D13172, we have an express server initialized in the master process. This express server didn't exist before. The situation was that we need the health check to be available during the migration and so needed to be initialized in the master process. However, this express server conflicts with later express servers in non-master processes that listen on the same port.

For example, if there's a hanging request on the master express server, the actual keyserver endpoints would never become available until all requests are resolved. This unfortunately doesn't seem to be guaranteed unless we use something like stoppable to forcibly shutdown the master express server so non-master servers can listen on the port.

This diff doesn't require the health check to exist prior to any non-master process code and so we can simply start with a health check and add endpoints later with a single express server.

will marked an inline comment as done.Sep 8 2024, 8:34 PM

update latest version to block on with default

keyserver/src/keyserver.js
213 ↗(On Diff #43960)

Why'd you opt to sleep first? Doesn't this slow down all secondary node startup by 5s?

keyserver/src/keyserver.js
213 ↗(On Diff #43960)

Even if we slept after we set dbVersion, wouldn't it still slow down secondary node startup by 5s? Was wondering if we needed to do another dbVersion < latestWrapInTransactionAndBlockRequestsVersion check inside the do block, but thought against it.

Code clarity wise, it likely makes more sense to flip them, so I'll do that right now.

keyserver/src/keyserver.js
213 ↗(On Diff #43960)