Page MenuHomePhabricator

[Tunnelbroker] Add docker image
ClosedPublic

Authored by jon on Jun 15 2023, 7:55 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 29, 1:00 PM
Unknown Object (File)
Fri, Nov 29, 11:54 AM
Unknown Object (File)
Sun, Nov 24, 1:52 AM
Unknown Object (File)
Sat, Nov 23, 7:02 AM
Unknown Object (File)
Tue, Nov 12, 8:11 PM
Unknown Object (File)
Thu, Nov 7, 10:07 PM
Unknown Object (File)
Nov 1 2024, 4:10 AM
Unknown Object (File)
Nov 1 2024, 3:31 AM
Subscribers

Details

Summary

Package tunnelbroker into a docker image

https://linear.app/comm/issue/ENG-4103

Test Plan
./services/tunnelbroker/make_docker_image.sh

or

# build all service images
cd services/ && docker-compose build

Diff Detail

Repository
rCOMM Comm
Branch
jonringer/build-tunnelbroker-image (branched from master)
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

varun requested changes to this revision.Jun 16 2023, 12:56 AM

do you not need to call docker-compose build? i ask because for identity, normal docker build doesn't work i think (there is some context in the docker-compose.yml file that is required)

also, should we have a make_docker_image script in identity that does something similar?

This revision now requires changes to proceed.Jun 16 2023, 12:56 AM

do you not need to call docker-compose build? i ask because for identity, normal docker build doesn't work i think (there is some context in the docker-compose.yml file that is required)

That's just a convience to make sure sure that the CWD matches the assumptions in the Dockerfile. You could also do docker build -f services/<service . and it would build fine. It's essentially what docker compose is doing with context: ../

should we have a make_docker_image script in identity that does something similar?

Depends on if it's a big painpoint for someone. Having the 9GB build context was for me.

I have mixed feelings about this script.
On the one hand, it indeed creates a smaller context, but on the other hand, it lacks flexibility and is cumbersome to maintain. Imagine you add a file to the tunnelbroker directory but forget to update the script - it will work perfectly locally but won't be included in docker context -> prod container will fail. If somebody is unaware of this script, one will have a tough time debugging.

I was thinking about tweaking .dockerignore to reduce context size, but probably the keyserver dockerfile depends on this too.
Wondering if there's an easy way to make it more customizable and reusable. For example, a script that reads files to be included from some .dockerinclude files per service.

BTW. Are we going to opt out of using docker-compose at all? Having this script + nix localstack, it would be no longer needed (unless somebody still uses docker in local dev)

BTW. Are we going to opt out of using docker-compose at all? Having this script + nix localstack, it would be no longer needed (unless somebody still uses docker in local dev)

We need Docker Compose for the keyserver config here.

I was thinking about tweaking .dockerignore to reduce context size, but probably the keyserver dockerfile depends on this too.

Yeah, unfortunately that's the case.

Wondering if there's an easy way to make it more customizable and reusable. For example, a script that reads files to be included from some .dockerinclude files per service.

That would be nice. I also wonder if there is some way to use Docker Compose for this, or perhaps there is some third-party open-source solution for it.

[from @jon's code comment] This file exists to make a smaller docker context, so that building it is significantly faster and requires less system resources

Can you be more concrete about the performance / efficiency benefits? I thought that a properly-formatted Dockerfile will only copy in what's necessary, and as a result it should avoid too many issues as a result of unrelated files. But I am probably missed some context.

On the one hand, it indeed creates a smaller context, but on the other hand, it lacks flexibility and is cumbersome to maintain. Imagine you add a file to the tunnelbroker directory but forget to update the script - it will work perfectly locally but won't be included in docker context -> prod container will fail. If somebody is unaware of this script, one will have a tough time debugging.

The tunnelbroker and other directories get copied. The real edge case would be a dependency on another directory (e.g. protos).

I was thinking about tweaking .dockerignore to reduce context size, but probably the keyserver dockerfile depends on this too. Wondering if there's an easy way to make it more customizable and reusable. For example, a script that reads files to be included from some .dockerinclude files per service.

That's why I came to this script. At microsoft, we did something similar to avoid python dev env's from bleeding into the build (similar to node_modules).

Can you be more concrete about the performance / efficiency benefits? I thought that a properly-formatted Dockerfile will only copy in what's necessary, and as a result it should avoid too many issues as a result of unrelated files. But I am probably missed some context.

BTW. Are we going to opt out of using docker-compose at all? Having this script + nix localstack, it would be no longer needed (unless somebody still uses docker in local dev)

docker-compose build still works. I just forgot to mention it in the test plan.

Can you be more concrete about the performance / efficiency benefits? I thought that a properly-formatted Dockerfile will only copy in what's necessary, and as a result it should avoid too many issues as a result of unrelated files. But I am probably missed some context.

Prior to building, docker will create a "context" which copies the CWD into a temporary directory. The root of the project is the current context for the docker image builds, so if you have done a yarn cleaninstall, it's pretty massive, around 9GB. With the script, it's just 100MB of files that need to be copied, so a 90x reduction in file I/O.

Update docker-compose.yml, reduce file image size

Can you be more concrete about the performance / efficiency benefits?

Personally, it was a lot faster in getting the image to initially build

In D8230#243760, @jon wrote:

Can you be more concrete about the performance / efficiency benefits?

Personally, it was a lot faster in getting the image to initially build

This isn't very concrete. Can you try to be more concrete?

Update aws credential location for docker compose

This isn't very concrete. Can you try to be more concrete?

257s vs 2s on my intel mac, so 125x increase for my laptop

=> [internal] load build context                                                            257.2s
 => => transferring context: 8.03GB                                                          256.3s

script:

=> [internal] load build context                                                              2.0s
 => => transferring context: 48.45MB                                                           2.0s

I think the additional 8GB was also causing more RAM thrashing, so it slows down a lot.

For my (linux) desktop, its like 20s vs .5s

services/tunnelbroker/make_docker_image.sh
16–17 ↗(On Diff #27904)

I did this to avoid copying over the target/ directory. Without enabling extglob, there's not a good way to do this with cp. I guess I could use find and -excludedir, but I find find hard to read.

BTW. Are we going to opt out of using docker-compose at all? Having this script + nix localstack, it would be no longer needed (unless somebody still uses docker in local dev)

We need Docker Compose for the keyserver config here.

I was asking about docker-compose in services/, not the keyserver one

In D8230#243746, @jon wrote:

On the one hand, it indeed creates a smaller context, but on the other hand, it lacks flexibility and is cumbersome to maintain. Imagine you add a file to the tunnelbroker directory but forget to update the script - it will work perfectly locally but won't be included in docker context -> prod container will fail. If somebody is unaware of this script, one will have a tough time debugging.

The tunnelbroker and other directories get copied. The real edge case would be a dependency on another directory (e.g. protos).

You specified explicitly which files/directories get copied - if somebody [unaware of this script] added a new file that doesn't match these paths in the script, it won't be copied, that's what I meant.

I was thinking about tweaking .dockerignore to reduce context size, but probably the keyserver dockerfile depends on this too. Wondering if there's an easy way to make it more customizable and reusable. For example, a script that reads files to be included from some .dockerinclude files per service.

That's why I came to this script. At microsoft, we did something similar to avoid python dev env's from bleeding into the build (similar to node_modules).

I see. My comment was about making this script more generic - its shape wont change between services except these paths - maybe there's a way to specify only the paths per-service, while other parts remain shared. But the cost and complexity might not be worth it.

docker-compose build still works. I just forgot to mention it in the test plan.

Yeah, but it doesn't take advantage of this script


Generally, I like the idea of this script, just looking for ways for potential improvements. Also curious about @varun's opinion on this

services/tunnelbroker/make_docker_image.sh
16–17 ↗(On Diff #27904)

The find would address my above concerns, but I agree it's hard to read

i'm pretty indifferent on docker vs docker-compose, but i feel strongly that we should only have one way to build and run services. can we try to make this make_docker_image.sh script extensible to our other services?

This revision is now accepted and ready to land.Jun 22 2023, 11:42 PM

can we try to make this make_docker_image.sh script extensible to our other services?

It's pretty specific to just tunnelbroker (e.g. tunnelbroker messages). I could make a smaller script which just takes the docker file path, and a bunch of other paths to create the context; but that's still going to be a rough 1 to 1 translation of complexity from the current script. I'm also scared of the "let's just extend this thing one more time to also handle this other edge case" (e.g. filtering target/).

The script really just solves building for constrained environments. On my beefy desktop, building from the repo root is a mild inconvenience, but from my laptop; it's a "get some coffee while this runs" type of thing.

You specified explicitly which files/directories get copied - if somebody [unaware of this script] added a new file that doesn't match these paths in the script, it won't be copied, that's what I meant.

I think that's valid. As for awareness, I documented the steps for buildling and deploying the image in the child diff D8312.

Regarding .dockerignore files, @michal found that we can have service-specific dockerignores: https://docs.docker.com/build/building/context/#dockerignore

But we'd need to call them tunnelbroker.Dockerfile and tunnelbroker.Dockerfile.dockerignore. However, that's not a bad solution I think.

Then @michal suggested me another interesting thing:

Why don't we create a file called Dockerfile.dockerignore in services/tunnelbroker with paths that should be ignored (or shouldn't be, by using !path/)

I did the following

cd services/tunnelbroker

# deliberately ignore the 'services/' dir to see if the build fails
echo "services" > Dockerfile.dockerignore

# ../../ is the repo root
docker build -f ./Dockerfile ../...

It indeed worked and ignored stuff defined in services/tunnelbroker/Dockerfile.dockerignore


Maybe we can take advantage of this behavior? Possible dockerignore for Tunnelbroker

# ignore everything initially
*

# copy needed stuff
!services/tunnelbroker
!shared/protos
!shared/tunnelbroker_messages
!scripts/install_protobuf.sh

# ignore specific stuff included above
services/*/target
This revision was automatically updated to reflect the committed changes.