Page MenuHomePhabricator

[keyserver] Move update-geoip to an earlier step in Docker build
ClosedPublic

Authored by ashoat on Apr 9 2023, 7:36 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jul 3, 6:47 AM
Unknown Object (File)
Sat, Jun 15, 7:55 PM
Unknown Object (File)
Sat, Jun 15, 5:07 AM
Unknown Object (File)
Fri, Jun 14, 4:47 AM
Unknown Object (File)
Fri, Jun 14, 4:04 AM
Unknown Object (File)
Tue, Jun 11, 10:07 AM
Unknown Object (File)
Mon, Jun 10, 8:21 AM
Unknown Object (File)
Sat, Jun 8, 11:31 PM
Subscribers

Details

Summary

The update-geoip script updates our GeoIP database during the Docker build to make sure that it is current. The GeoIP database is otherwise dated as of the last publish of the geoip-lite NPM package that we use.

The goal of this diff is to make it easier for me to iterate on code in the production Docker environment. Right now, because the update-geoip happens after the COPY . . step, it gets re-run after any change, which means I have to wait ~15 minutes to test any change.

In this diff, we move the update-geoip step earlier. But because that necessitates moving it before any Babel transpilation, we can no longer use our own update-geoip script. That script (which is deleted here) basically just strings together a getCommConfig call with a call to the underlying update-geoip script we get from the geoip-lite package.

So this diff ultimately does three things:

  1. Replaces the existing update-geoip script with a new one that only supports the Docker environment, but doesn't require any Babel transpilation
  2. Makes it so the update-geoip script is only called in the Docker environment
  3. Moves the update-geoip script invocation to an earlier step in the Docker build

I think it's fine that update-geoip only supports the Docker environment going forward. It takes a long time and isn't useful for the dev environment.

Test Plan

Ran the Docker build locally, both with and without the GeoIP license key being set in keyserver/.env

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Apr 9 2023, 7:50 AM
Harbormaster failed remote builds in B18176: Diff 24888!

Add condition to script invocation in case license key is not present

ashoat requested review of this revision.Apr 9 2023, 9:37 AM
keyserver/package.json
14 ↗(On Diff #24889)

The reason I put this here instead of the Dockerfile was that it was the only way to keep Docker from printing the actual GeoIP license to the screen during build

atul added inline comments.
keyserver/package.json
14 ↗(On Diff #24889)

Personally think it'd be cleaner to just move this to it's own sh file in scripts, but either way is fine.

This revision is now accepted and ready to land.Apr 9 2023, 1:19 PM

Extracting to a bash file makes sense... it's way more readable, and we're able to get Shellcheck to run on it. That said, Shellcheck was a bit confused on one line (I think) – would love to get @jon's take

keyserver/bash/docker-update-geoip.sh
12 ↗(On Diff #24891)

Here's what these rules were saying:

^-- SC2001 (style): See if you can use ${variable//search/replace} instead.
       ^-- SC2027 (warning): The surrounding quotes actually unquote this. Remove or escape them.
       ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
  • SC2001 – I wasn't able to use bash find-and-replace substitution feature as it doesn't appear to support RegEx capture groups
  • SC2027, SC2086 – I'm aware, this is my intention. Everything is double-quoted, but I double-quote the variable separately. The only way I could get Shellcheck to stop complaining was if I left just a single set of quotes around the variable, but that has the effect of leaving the variable unquoted
This revision now requires review to proceed.Apr 9 2023, 8:13 PM
keyserver/bash/docker-update-geoip.sh
14 ↗(On Diff #24893)

Here's what these rules were saying:

^-- SC2001 (style): See if you can use ${variable//search/replace} instead.
       ^-- SC2027 (warning): The surrounding quotes actually unquote this. Remove or escape them.
       ^-- SC2086 (info): Double quote to prevent globbing and word splitting.
  • SC2001 – I wasn't able to use bash find-and-replace substitution feature as it doesn't appear to support RegEx capture groups
  • SC2027, SC2086 – I'm aware, this is my intention. Everything is double-quoted, but I double-quote the variable separately. The only way I could get Shellcheck to stop complaining was if I left just a single set of quotes around the variable, but that has the effect of leaving the variable unquoted

@jon – ping on this, please make sure to monitor your review queue!

jon requested changes to this revision.Apr 11 2023, 8:10 AM
jon added inline comments.
keyserver/bash/docker-update-geoip.sh
6–10 ↗(On Diff #24893)

declare is not really necessary here, it doesn't have much of an effect if it wasnt defined. And will just redefine the variable to the same value it was defined already.

In case another calling script has set -u, I added ${FOO-}. The - substitution will avoid set -u causing a failure.

15 ↗(On Diff #24893)

I think the errors are referencing the extra double quotes

This revision now requires changes to proceed.Apr 11 2023, 8:10 AM
ashoat added inline comments.
keyserver/bash/docker-update-geoip.sh
6–10 ↗(On Diff #24893)

I think declare is only here for shellcheck. I'll check if Shellcheck will drop it if I use the substitution trick

15 ↗(On Diff #24893)

Yeah, that's what Shellcheck wanted me to do. But I think it's unsafe - doesn't that mean $COMM_JSONCONFIG_secrets_geoip_license won't be quoting? I think the open quote is actually closing the quote from line 13, and the close quote is actually opening a new quote. So that leaves $COMM_JSONCONFIG_secrets_geoip_license unquoted

Re-requesting your review for this

switched [[ -n to [[ -r, as I realized later that COMM_JSONCONFIG_secrets_geoip_license was a file path to a json config.

keyserver/bash/docker-update-geoip.sh
15 ↗(On Diff #24893)

For SC2001, I think it was hinting that stuff like echo | sed can often be replaced with:

$ FOO=var-baz
$ echo ${FOO//var/bar}
bar-baz

However, since you're doing a regex replacement, I would just silence it. I would write:

#!/usr/bin/env bash

# run as: Docker container user
# run from: keyserver dir

set -euo pipefail

if [[ -r"${COMM_JSONCONFIG_secrets_geoip_license-}" ]]; then
  # shellcheck source=/dev/null
  . bash/source-nvm.sh
  node node_modules/geoip-lite/scripts/updatedb.js license_key="$( \
    # shellcheck disable=SC2001
    echo "$COMM_JSONCONFIG_secrets_geoip_license" \
      | sed 's/{\"key\":\"\([a-zA-Z0-9]*\)\"}/\1/' \
  )"
fi

which shellcheck seems to be happy with.

My preference would also be to resolve the script location, then work from a known directory:

#!/usr/bin/env bash

# run as: Docker container user

set -euo pipefail

SCRIPT_DIR=$(cd "$(dirname "$0")"; pwd -P)

if [[ -r "${COMM_JSONCONFIG_secrets_geoip_license-}" ]]; then
  . "${SCRIPT_DIR}/source-nvm.sh"
  node "${SCRIPT_DIR}/../node_modules/geoip-lite/scripts/updatedb.js" \
    license_key="$( \
      # shellcheck disable=SC2001
      echo "$COMM_JSONCONFIG_secrets_geoip_license" \
        | sed 's/{\"key\":\"\([a-zA-Z0-9]*\)\"}/\1/' \
    )"
fi

which shellcheck seems to be happy with as well.

If we had jq available, then I would do:

#!/usr/bin/env bash

# run as: Docker container user

set -euo pipefail

SCRIPT_DIR=$(cd "$(dirname "$0")"; pwd -P)

if [[ -r "${COMM_JSONCONFIG_secrets_geoip_license-}" ]]; then
  . "${SCRIPT_DIR}/source-nvm.sh"
  license_key=$(jq -r '.key' "$COMM_JSONCONFIG_secrets_geoip_license")
  node "${SCRIPT_DIR}/../node_modules/geoip-lite/scripts/updatedb.js" \
    license_key="$license_key"
fi
jon requested changes to this revision.Apr 11 2023, 9:32 AM
This revision now requires changes to proceed.Apr 11 2023, 9:32 AM
ashoat requested review of this revision.EditedApr 11 2023, 9:57 AM

You addressed a lot of things, but you didn't address the thing I asked for re-review on. Can you respond to this please?

Yeah, that's what Shellcheck wanted me to do. But I think it's unsafe - doesn't that mean $COMM_JSONCONFIG_secrets_geoip_license won't be quoted? I think the open quote is actually closing the quote from line 13, and the close quote is actually opening a new quote. So that leaves $COMM_JSONCONFIG_secrets_geoip_license unquoted

Re-requesting your review for this

As for this:

COMM_JSONCONFIG_secrets_geoip_license was a file path to a json config

This is not true. COMM_JSONCONFIG variables are never file paths, they are always JSON blobs.

Yeah, that's what Shellcheck wanted me to do. But I think it's unsafe - doesn't that mean $COMM_JSONCONFIG_secrets_geoip_license won't be quoted? I think the open quote is actually closing the quote from line 13, and the close quote is actually opening a new quote. So that leaves $COMM_JSONCONFIG_secrets_geoip_license unquoted

Correct, it won't be quoted. It's just two empty strings on either side of a string literal. In other words, ""foo"" is the same as foo in bash, likewise ""$foo"" is the same as $foo.

$ [[ ""foo"" == foo ]] && echo true
true

I think shellcheck is getting tripped up on what you were trying to work around, and then you ended up disabling 3 rules, when it should have just been the original one about replacement.

This is not true. COMM_JSONCONFIG variables are never file paths, they are always JSON blobs.

Oh, I was trying to figure out why sed was involved and got stuck on it's default behavior of expecting a filepath when stdin isn't bound. Couldn't you just construct license_key="{\"key\":\"\${COMM_JSONCONFIG_secrets_geoip_license}\"}"?

This revision now requires changes to proceed.Apr 11 2023, 10:46 AM

COMM_JSONCONFIG variables are never file paths, they are always JSON blobs.

I think storing JSON payloads in bash variables is highly unusual. COMM_JSONCONFIG_SECRETS_GEOIP_LICENSE_KEY containing a license key seems more inline with what I would expect for service configuration than "COMM_JSONCONFIG_secrets_geoip_license is an escaped JSON payload with a top-level object which only has a key called 'key', which the value contains a license key".

I still do not understand if it's dangerous to leave the variable unquoted. You and Shellcheck both seem to think that bash variables need to be quoted, but you are both suggesting a solution where it is not quoted. Can you please clarify why you think it's safe to leave the variable unquoted?

Synced with @jon offline. We came to the conclusion that bash does not close the double quote in "$( until it encounters a )", so we're safe to use a double quote inside the subshell

Also addressed the rest of @jon's feedback and re-tested (both with and without the GeoIP license key existing). We only have one Shellcheck override here now

keyserver/bash/docker-update-geoip.sh
8 ↗(On Diff #25008)

We're actually doing the same thing here: double quotes inside double quotes. It's okay because the "$( can only be closed by )"

keyserver/bash/docker-update-geoip.sh
11 ↗(On Diff #25008)

Shellcheck still doesn't like this:

. "${SCRIPT_DIR}/source-nvm.sh"
  ^---------------------------^ SC1090: Can't follow non-constant source. Use a directive to specify location.

Looks like I'm gonna need to re-add the # shellcheck source=/dev/null unfortunately...

Re-add # shellcheck source=/dev/null

This revision is now accepted and ready to land.Apr 11 2023, 3:06 PM