Page MenuHomePhabricator

[backup-client] Compile to wasm
ClosedPublic

Authored by michal on Feb 12 2024, 7:18 AM.
Tags
None
Referenced Files
F3764408: D11041.id.diff
Sat, Jan 11, 12:46 PM
Unknown Object (File)
Sat, Jan 11, 4:29 AM
Unknown Object (File)
Thu, Jan 9, 8:45 PM
Unknown Object (File)
Thu, Jan 9, 4:00 AM
Unknown Object (File)
Wed, Jan 8, 5:19 PM
Unknown Object (File)
Wed, Jan 8, 5:15 PM
Unknown Object (File)
Wed, Jan 8, 2:26 PM
Unknown Object (File)
Sun, Jan 5, 3:37 PM
Subscribers

Details

Summary

Compiles backup client to WASM for web. Uses a similar approach to SQliteQueryExecutor where we keep the files inside of the git repository.

  • tokio-tungstenite-wam allows us to have one API for native (tungstenite) and web websockets. It has very similar API to tungstenite but required a few small changes that I will annotate
  • web.rs contains a wasm-only file providing JS<->Rust conversions and helpers
  • A few structs got either serde implementations or conditional wasm_bindgen attributes so that they are available in JS or can be deserialized as arguments from JS

Adding @ashoat as reviewer because of dependency changes.

Depends on D11040

Test Plan
  • Check that native still compiles (both Android and iOS)
  • Check that compiling to WASM works (yarn build)
  • Testing that backup client in WASM works is tested in the future diffs

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
web/backup-client-wasm/package.json
1–14 ↗(On Diff #37069)

You appear to be adding a new NPM project here, but excluding it from our workspace. Can you explain the motivations for this decision, and why it's being excluded from the workspaces list in our root package.json?

Additionally – if this approach diverges from the approach we took with comm_query_executor.wasm, can you explain the motivations for that as well?

6–9 ↗(On Diff #37069)

Not sure this does anything if we never publish

12 ↗(On Diff #37069)

Should we have a clean step? Should the clean step be run when the user runs yarn cleaninstall at the root?

web/package.json
14 ↗(On Diff #37069)

Why does this NPM script need to exist? It seems like it's just an alias for another NPM script. Can't we run that first NPM script directly?

If we keep this, can we try yarn workspace backup-client-wasm build?

web/backup-client-wasm/package.json
1–14 ↗(On Diff #37069)

Looks like it's added in D11042. I'm still curious about differences with the approach we took with comm_query_executor.wasm, though

Updated the approach to be more similar to the comm_query_executor (so no additional yarn workspace, the rust crate is now part of web).

Confused why Phabricator isn't hiding the files in the _generated folder.

Me too, I'm not sure why. I wonder if it's broken? D10838 (a random diff which modified comm_query_executor) also doesn't have hidden files.

  1. I modified the existing CI job to also check the backup client WASM. I'm not sure how to proceed with the filename though. Should I just rename it to something like "wasm-build.yml" (and change filename in buildkite pipeline definition) and devs would be expected to rebase onto these changes to fix the CI?
  2. Task created here: ENG-6818
  3. The code isn't tightly coupled and changes in one won't result in changes to the other. Also the build systems are completely different (emscripten vs cargo+wasm-pack) so I don't think the emscripten build script should build the backup client code. Additionally I don't think we need to have a separate script for the backup client as wasm-pack deals with the complexity for use. We don't need to specify all the different files we want to include in the resulting WASM, it's done automatically as a part of rust build. A package.json command that runs web-pack with required flags should be enough.

I've decided to go with the comm_query_executor.wasm approach for the same reasons we decided to with it for the database code: having the .wasm file in the repo greatly reduces the developer friction compared to having to upload a new package to NPM, especially for developers who haven't done it before.

The tradeoffs for the the different approaches for the comm_query_executor were discussed in this comment. In particular for the NPM package:

NPM package
Similar to olm, but I am afraid that it will be really painful, after looking at the codebase a think changes to the database are rather frequent, which means we will need to (each time there are some changes) build this package, publish, bump version in package.json and after make a web release. And also somehow setup dev environment to reflect changes in database files, which is basically similar to option 2 but without the docker problem.


On the other hand we have already added some rust build step during yarn cleaninstall -> for the keyserver we build the rust-node-addon. So there would already be some precedence for every dev building it. If we decided to go with this approach we would:

  1. Add clean-backup-client-wasm to yarn clean
  2. Add build-backup-client-wasm to web's postinstall script
  3. .gitignore the generated files

On my machine (M2 Pro, 16GB), running build-backup-client-wasm after a yarn cleaninstall takes about ~22s with the global cargo cache having all dependencies (so everything is downloaded but needs to be compiled from scratch).

shared/backup_client/Cargo.toml
27–32 ↗(On Diff #37069)

wasm-bindgen was already used by comm-opaque2.
Licenses:

  • wasm-bindgen, wasm-bindgen-futures, web-sys (they share the repo): double licensed: MIT, Apache
  • tokio-tungstenite-wasm: MIT
  • serde-wasm-bindgen: MIT
web/backup-client-wasm/package.json
1–14 ↗(On Diff #37069)

The differences between this and comm_query_executor were mostly because one is C++ with emscripten and another is a Rust project so I was mixing approaches between comm_query_executor and comm-opaque2-wasm. I've simplified the code a bit now, so that it's now more similar to comm_query_executor and it's no longer a new yarn workspace/npm project.

I've explained a bit more about why I chose this general approach in the top level comment.

12 ↗(On Diff #37069)

Added a clean step in the web/package.json.

Using the approach where we commit the files to the repo, we shouldn't add it to the yarn cleaninstall command (the files should always be current so there shouldn't be a need to re-build them).

web/package.json
14 ↗(On Diff #37069)

I wanted to keep the build-db-wasm and build-backup-client-wasm commands together so they are more discoverable.

I've now moved the whole logic into these top-level (from the web workspace perspective) commands.

Seems like the emscripten CI uses emscripten/emsdk:3.1.17 docker image which probably doesn't have yarn. In that case it would probably be easier to introduce a new buildkite job for backup-client-wasm.

New approach makes sense to me!

Seems like the emscripten CI uses emscripten/emsdk:3.1.17 docker image which probably doesn't have yarn. In that case it would probably be easier to introduce a new buildkite job for backup-client-wasm.

That makes sense to me

Added a clean step in the web/package.json.

Using the approach where we commit the files to the repo, we shouldn't add it to the yarn cleaninstall command (the files should always be current so there shouldn't be a need to re-build them).

Hmm... how about the target folder?

.eslintignore
11 ↗(On Diff #37262)

Any reason you gave a specific filename here instead of the whole _generated?

Reverted the changes to emscripten buildkite CI job for now, I will add a new buildkite job in a later revision. Add the target folder to the general yarn workspace web clean command.

.eslintignore
11 ↗(On Diff #37262)

Sorry, I forgot to flag this in an inline comment because there were a lot of changes.

I'm having a bit of a problem with specifying flow types for the WASM module. In this revision I've created a web/backup-client-wasm/_generated/backup-client-wasm.js.flow file with flow types according to docs. And it works great except that it's in a _generated folder...

Do you maybe have any ideas for a better way to solve this? Docs for library definitions say

For defining modules that are accessed via a relative require/import path, please see the docs on the .flow files

So I don't think we can use them for this purpose.

comm-query-executor defines a function getDatabaseModule that casts any to a type defined in web/database/types/module.js. This works well because it only has one export. I guess we could go with something similar for backup client but it default exports an init function, a RequestedData enum and a BackupClient class so it's not as clean.

Confused why Phabricator isn't hiding the files in the _generated folder.

I think we need to add a @generated phrase to the file if we want to hide it - like here.

web/backup-client-wasm/_generated/backup-client-wasm.js.flow
1 ↗(On Diff #37300)

I think the fact that this file is in /_generated means that after cleaning a building backup-client I will end up with deleted types. Maybe there is some workaround to improve it?

web/backup-client-wasm/_generated/backup-client-wasm.js.flow
3 ↗(On Diff #37300)

shouldn't we type it better like here?

You could potentially define the types in a different place entirely in web. For instance, consider how we type the ethers.js package – instead of a libdef, we just have a EthersProvider type that covers the four functions we use on it. We import directly from the package, and then we cast the import to our type.

Another option is to keep the .flow types in the same folder, but to name the folder something different than generated, and to be more specific (specific to files rather than general to the whole folder) in our .gitignore, .eslintignore, and rm-rf workflows. This is what we do with our grpc-web bindings, which are stored in web/protobufs. We could use the @generated pragma for the files that are actually generated, and skip it for the files that aren't.

ashoat requested changes to this revision.Feb 16 2024, 9:51 AM

Back to you for the Flow stuff, but this is looking almost ready

This revision now requires changes to proceed.Feb 16 2024, 9:51 AM
  • Renamed _generated folder to wasm because it will also include the flow types
  • Made the clean-backup-client-wasm skip the flow type file, and the eslint skip only the generated file
  • Added a run_wasmpack.sh script (inspired by run_emscripten.sh) as there are two additional actions we need to take and I think this is enough complexity to warrant a script file:
    • remove the autogenerated .gitignore file
    • add "@generated" at the top of the generated JS file
  • Update the flow types

Additionally I have created a task for the buildkite CI job: ENG-6873. I will add a diff later in the stack with this code, and land this only after the buildkite job is accepted. I did this because I wasn't able to get to implementing it today, but I wanted feedback on this (and later) diffs faster.

This revision is now accepted and ready to land.Feb 19 2024, 7:50 PM

Would be good to have somebody else approve the Rust code, as I'm not capable of reviewing that. @bartek is out so maybe @kamil?

This revision now requires review to proceed.Feb 19 2024, 7:51 PM
kamil added inline comments.
web/backup-client-wasm/wasm/backup-client-wasm.js.flow
3 ↗(On Diff #37332)

shouldn't we type it like this?

This revision is now accepted and ready to land.Feb 21 2024, 4:40 AM

As discussed in ENG-6873 : Add a buildkite job for checking integrity of backup client wasm file, we are changing thr approach and instead of keeping the wasm file in git repo we will be building it on cleaninstall as we do with the keyserver node addon.

  • Remove the generated files from repo and add them to the gitignore
  • Make cleaninstall also clear these files
  • Add a postinstall script to web which runs the backup client wasm build
  • No longer add the @generated header to the generated js file

Fixes for keyserver docker build.

Try adding wasm pack as a yarn dependency for non-nix workflows.

michal requested review of this revision.Mar 4 2024, 7:37 AM
michal added 1 blocking reviewer(s): ashoat.

Requesting review again because there were some changes:

  • As mentioned in one of the previous comments, the generated files are no longer in repo and instead we have a postinstall script
  • wasm-pack (the CLI we are using for easy compiling to ready-to-use wasm) is provided on nix which is problematic for other platforms like CI. There are other download options and I decided to go with a pre-built npm dependency in web, as the easiest and the fastest (cargo install would compile from scratch)
  • There were some changes required in the keyserver dockerfile to bring in our rust shared/ deps
keyserver/Dockerfile
59

wasm-pack uses a wasm-opt tool which optimizes wasm files size. It's part of the binaryen suite of tools.

For some reason when using building the keyserver image locally I'm getting:

Error: no prebuilt wasm-opt binaries are available for this platform: Unrecognized target! To disable wasm-opt, add wasm-opt = false to your package metadata in your 'Cargo.toml'.
  • This does not happen when wasm-pack is installed with nix and building the WASM outside of docker
  • I didn't want to disable wasm-opt
  • Installing binaryen makes the wasm-pack use the package on $PATH and the docker builds successfuly
    • This PR added the option of using the version installed globally
  • There are a few related github issues:
  • Some of them are closed but we are using the versions with the fixes
  • Some other suggestions from the issues:
    • Disabling the wasm opt but I think we should avoid that
    • Using cargo install wasm-pack --git https://github.com/rustwasm/wasm-pack --rev ae10c23cc14b79ed37a7be222daf5fd851b9cd0d but 1. this will compile it from scratch 2. uses an older version of the wasm-pack 3. Is more of a workaround than a fix

As mentioned in one of the previous comments, the generated files are no longer in repo and instead we have a postinstall script

I know this was discussed earlier, but can you remind me why we're taking the rust-node-addon approach here instead of the opaque-ke-wasm approach? Is it because of issues getting the CI pipeline working?

.eslintignore
9

Is there a similar pkg folder we should add to .eslintignore for backup-client-wasm?

keyserver/Dockerfile
54

Please add a comment here explaining why we need binaryen

59

Thanks for explaining all of the debugging you've done!

59

I wonder if this could be "solved" by installing wasm-opt via NPM?

117

Can we use a dash for the naming? I can see that web/scripts/run_emscripten.sh does not follow this convention, but in general I think we use dashes for filenames instead of underscores

web/package.json
9

It looks like when the input to the WASM changes, yarn dev won't know to recompile. In those cases, does the dev need to run yarn build-backup-client-wasm manually? And then after, run yarn dev again?

16

Should this new postinstall.sh file be in web/scripts?

Renamed the script to run-emscripten.sh, moved postinstall.sh to scripts. Added web/backup-client-wasm/target to .dockerignore.

I know this was discussed earlier, but can you remind me why we're taking the rust-node-addon approach here instead of the opaque-ke-wasm approach? Is it because of issues getting the CI pipeline working?

Yes, basically the it WASM compilation wasn't as deterministic as I had hoped and I wasn't able to make the CI work. Because of this we can't guarantee that the repo has the most current changes. With these changes we can be sure that after yarn cleaninstall we have the current WASM code.

.eslintignore
9

I rename the pkg folder to wasm with the --out-dir "${OUT_DIR}" flag in run-wasmpack.sh script. We don't .eslintignore the whole folder because in case of backup client it also contains the .flow file with types.

We could stop passing the flag and instead just use the pkg folder but because we have some non-generated code inside I think the wasm folder name makes more sense.

keyserver/Dockerfile
59

I was able to build the docker image with the binaryen NPM package. So I removed the native package here and added it to package.json.

web/package.json
9

Yes, the dev will need to run yarn build-backup-client-wasm manually. But webpack (more specifically the CopyPlugin) should pick up the changes automatically after that.

Fix script path in dockerfile

Try reverting binaryen to the apt install because of CI failure.

When using binaryen from npm I was getting this error in CI:

<...binary blob...>

RangeError: Invalid typed array length: 4289064482
    at new Int8Array (<anonymous>)
    at Object.write (/comm/node_modules/binaryen/bin/wasm-opt:2:9155272)
    at Object.write (/comm/node_modules/binaryen/bin/wasm-opt:2:9201027)
    at doWritev (/comm/node_modules/binaryen/bin/wasm-opt:2:9193410)
    at _fd_write (/comm/node_modules/binaryen/bin/wasm-opt:2:9193623)
    at wasm://wasm/01a172d6:wasm-function[2458]:0x2a68c7
    at wasm://wasm/01a172d6:wasm-function[1276]:0x1138a7
    at wasm://wasm/01a172d6:wasm-function[534]:0xa4e68
    at wasm://wasm/01a172d6:wasm-function[2403]:0x2a5068
    at wasm://wasm/01a172d6:wasm-function[2451]:0x2a6628
Error: failed to execute `wasm-opt`: exited with exit status: 7
  full command: "/comm/web/node_modules/.bin/wasm-opt" "/comm/web/scripts/../backup-client-wasm/wasm/backup-client-wasm_bg.wasm" "-o" "/comm/web/scripts/../backup-client-wasm/wasm/backup-client-wasm_bg.wasm-opt.wasm" "-O"
To disable `wasm-opt`, add `wasm-opt = false` to your package metadata in your `Cargo.toml`.

so I decided to revert back to the binaryen installed from apt. I don't think using either one over the other is much of a difference so I didn't do much investigation why the error happened (also the apt version might a bit faster but this is just a gut feeling, I didn't do any benchmarks). If the npm version is more preferable and I should spend some time trying to make it work, let me know.

This revision is now accepted and ready to land.Mar 5 2024, 4:20 PM