[terraform] Ensure that the primary service is running before deploying secondary service
ClosedPublic
Actions

Authored by will on Jul 11 2024, 5:12 PM.

Details

Reviewers

varun
bartek

Commits

rCOMMc46b08343400: [terraform] Ensure that the primary service is running before deploying…

Summary

In writing the migration script, I realized that we likely want the primary node to run before the secondary nodes run on the very first deployment as well.

This accomplishes ordering the secondary node deployments behind the first by checking for a 200 OK on https://wyilio.com/health.

Depends on D12729

Test Plan

I terraform destroyed my self-host setup and ran through an initial deployment. This script looped before creating the secondary keyserver service.
I had to add the new load balancer endpoint to my squarespace dns rules and waited for the primary node to come online. Once online, the script
received a status 200 OK and continued in creating the secondary nodes.

Diff Detail

Repository

rCOMM Comm

Lint

Lint Not Applicable

Unit

Tests Not Applicable

Event Timeline

will created this revision.Jul 11 2024, 5:12 PM

Herald added subscribers: tomek, ashoat. · View Herald TranscriptJul 11 2024, 5:12 PM

Harbormaster completed remote builds in B30291: Diff 42238.Jul 11 2024, 5:28 PM

will requested review of this revision.Jul 11 2024, 5:28 PM

will added a child revision: D12731: [terraform] Migration script for for self-hosted keyserver.Jul 11 2024, 6:50 PM

ashoat added inline comments.Jul 12 2024, 3:58 AM

services/terraform/self-host/keyserver_secondary.tf
136–144 ↗	(On Diff #42238)	The indentation in this part is weird. Does this work?
137–144 ↗	(On Diff #42238)	The indentation in this part is weird

@will mentioned a concern I had raised in a 1:1 here. Not sure if we want to address that here, but broadly I'm worried that the health check might not work if we're doing it after bringing the load balancer publicly online with only a single node (the primary) accessible

In D12730#360482, @ashoat wrote:

@will mentioned a concern I had raised in a 1:1 here. Not sure if we want to address that here, but broadly I'm worried that the health check might not work if we're doing it after bringing the load balancer publicly online with only a single node (the primary) accessible

This is a valid concern. I think this can be solved with some smart load balancer & network configuration (IIRC there is a way to prioritize health checks traffic in AWS), but I need to research on what's the best way of doing that. I think we can figure it out later.

This revision is now accepted and ready to land.Jul 17 2024, 8:04 AM

Looks like @will figured out a solution within the bash script in D12731

I think this can be solved with some smart load balancer & network configuration (IIRC there is a way to prioritize health checks traffic in AWS), but I need to research on what's the best way of doing that. I think we can figure it out later.

This sounds like potentially a better solution, but the current solution in D12731 works for now – agree we can address it later. @will maybe you can create a follow-up task before landing to investigate @bartek's proposal here?

In D12730#361895, @ashoat wrote:

Looks like @will figured out a solution within the bash script in D12731

I think this can be solved with some smart load balancer & network configuration (IIRC there is a way to prioritize health checks traffic in AWS), but I need to research on what's the best way of doing that. I think we can figure it out later.

This sounds like potentially a better solution, but the current solution in D12731 works for now – agree we can address it later. @will maybe you can create a follow-up task before landing to investigate @bartek's proposal here?