Page MenuHomePhabricator

[terraform] Ensure that the primary service is running before deploying secondary service
ClosedPublic

Authored by will on Jul 11 2024, 10:12 AM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Dec 24, 1:47 AM
Unknown Object (File)
Tue, Dec 24, 1:47 AM
Unknown Object (File)
Tue, Dec 24, 1:47 AM
Unknown Object (File)
Tue, Dec 24, 1:47 AM
Unknown Object (File)
Tue, Dec 24, 1:46 AM
Unknown Object (File)
Sat, Dec 21, 2:38 AM
Unknown Object (File)
Fri, Dec 20, 6:06 AM
Unknown Object (File)
Thu, Dec 12, 5:41 AM
Subscribers

Details

Summary

In writing the migration script, I realized that we likely want the primary node to run before the secondary nodes run on the very first deployment as well.

This accomplishes ordering the secondary node deployments behind the first by checking for a 200 OK on https://wyilio.com/health.

Depends on D12729

Test Plan

I terraform destroyed my self-host setup and ran through an initial deployment. This script looped before creating the secondary keyserver service.
I had to add the new load balancer endpoint to my squarespace dns rules and waited for the primary node to come online. Once online, the script
received a status 200 OK and continued in creating the secondary nodes.

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

will requested review of this revision.Jul 11 2024, 10:28 AM
services/terraform/self-host/keyserver_secondary.tf
136–144 ↗(On Diff #42238)

The indentation in this part is weird. Does this work?

137–144 ↗(On Diff #42238)

The indentation in this part is weird

@will mentioned a concern I had raised in a 1:1 here. Not sure if we want to address that here, but broadly I'm worried that the health check might not work if we're doing it after bringing the load balancer publicly online with only a single node (the primary) accessible

@will mentioned a concern I had raised in a 1:1 here. Not sure if we want to address that here, but broadly I'm worried that the health check might not work if we're doing it after bringing the load balancer publicly online with only a single node (the primary) accessible

This is a valid concern. I think this can be solved with some smart load balancer & network configuration (IIRC there is a way to prioritize health checks traffic in AWS), but I need to research on what's the best way of doing that. I think we can figure it out later.

This revision is now accepted and ready to land.Jul 17 2024, 1:04 AM

Looks like @will figured out a solution within the bash script in D12731

I think this can be solved with some smart load balancer & network configuration (IIRC there is a way to prioritize health checks traffic in AWS), but I need to research on what's the best way of doing that. I think we can figure it out later.

This sounds like potentially a better solution, but the current solution in D12731 works for now – agree we can address it later. @will maybe you can create a follow-up task before landing to investigate @bartek's proposal here?

Looks like @will figured out a solution within the bash script in D12731

I think this can be solved with some smart load balancer & network configuration (IIRC there is a way to prioritize health checks traffic in AWS), but I need to research on what's the best way of doing that. I think we can figure it out later.

This sounds like potentially a better solution, but the current solution in D12731 works for now – agree we can address it later. @will maybe you can create a follow-up task before landing to investigate @bartek's proposal here?

Issue created here: https://linear.app/comm/issue/ENG-8840/investigate-whether-we-want-go-with-taking-down-the-load-balancer