Page MenuHomePhabricator

Introduce Brotli compression for keyserver -> client socket messages
ClosedPublic

Authored by ashoat on Sep 25 2023, 1:36 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Nov 20, 4:39 PM
Unknown Object (File)
Wed, Nov 20, 4:39 PM
Unknown Object (File)
Wed, Nov 20, 4:39 PM
Unknown Object (File)
Wed, Nov 20, 4:37 PM
Unknown Object (File)
Wed, Nov 20, 4:18 PM
Unknown Object (File)
Mon, Nov 11, 4:33 AM
Unknown Object (File)
Fri, Nov 1, 12:07 PM
Unknown Object (File)
Oct 19 2024, 4:20 AM
Subscribers

Details

Summary

This addresses ENG-5018, which is one of the follow-ups to ENG-5007.

Scoping this just to staff members for now. If this goes well, we can try the same thing for client -> keyserver socket messages and for HTTP REST endpoints as well.

Depends on D9293

Test Plan

I tested sending a UTF-8 message (emoji with skin tone) from web to native and from native to web. I then forced a socket reconnection for the sender, and confirmed that the message didn't change once the socket was established.

I also tested adding somebody to the Comm org in my local environment. Note that I have only 275 chats inside the Comm org in my local environment, compared to 971 in production.

compressing message with size 7249341 to size 49747 took 1305ms
compressing message with size 7259629 to size 52163 took 1389ms
compressing message with size 6950442 to size 49431 took 1225ms
compressing message with size 7249345 to size 49683 took 1329ms

The compression time was approximately the same whether the sync or async Brotli compression function was used.

While this will increase CPU usage, the massive benefits of shrinking a 6.91 MiB payload to 48.58 KiB definitely outweigh the costs. I also made sure to only utilize the compression when the payload is at least 4 KiB.

Diff Detail

Repository
rCOMM Comm
Branch
ashoat/brotli
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

native/utils/decompress.js
4 ↗(On Diff #31417)

brotli.js relies on Node.js's Buffer. Recent versions of React Native include this spec-compliant implementation of it

web/.eslintrc.json
8 ↗(On Diff #31417)

Buffer is from Node.js and does not appear on the web. However we polyfill it here

web/utils/decompress.js
1–16 ↗(On Diff #31417)

Identical to native, except there's no need to import Buffer

I was testing compression while trying to reduce report sizes, but I used different packages ENG-3890 and failed to find something working well on reports, this one seems promising, especially on native we use only decompression which is less CPU consuming than compression, but you can you somehow test this with large payloads to make sure it will not kill the app performance for the time of decompressing?

atul accepted this revision.EditedSep 26 2023, 9:55 AM

Accepting to unblock, however, would be good to get thoughts on the following:

  1. It looks like Brotli is included in Zlib in (at least) Node >= 18 LTS: https://nodejs.org/dist/latest-v18.x/docs/api/zlib.html#class-brotlioptions. Would it be possible to leverage this instead of brotli.js (which doesn't seem to have a "real" commit since ~2016)?
  2. Further, instead of "rolling our own" compressMessage, could we use the WebSocket per message-deflate extension? (https://datatracker.ietf.org/doc/html/rfc7692). Seems like it can seamlessly be integrated w/ ws: https://github.com/websockets/ws#websocket-compression. This leverages the Node/zlib library mentioned above out of the box.
  3. For non-WebSocket keyserver endpoints, we could use eg Express compression middleware (https://expressjs.com/en/resources/middleware/compression.html) instead of rolling something like compressMessage on our own and applying it on an endpoint by endpoint basis.
  4. From the "Express compression middleware" docs:

For a high-traffic website in production, the best way to put compression in place is to implement it at a reverse proxy level (see Use a reverse proxy). In that case, you do not need to use compression middleware. For details on enabling gzip compression in Nginx, see Module ngx_http_gzip_module in the Nginx documentation.

We don't use Nginx, but Apache has a mod_brotli module that we could use to achieve the same thing: https://httpd.apache.org/docs/2.4/mod/mod_brotli.html?

At a high level, it seems like we might be able to leverage existing compression capablilties "below" the application layer "for free."

This revision is now accepted and ready to land.Sep 26 2023, 9:55 AM
native/utils/decompress.js
11–14 ↗(On Diff #31417)

It seems like we're making four copies of the data here, I wonder if there's a way to reduce that?

Example of using built-in Node functionality

07343e.png (1×2 px, 493 KB)

In D9294#273227, @atul wrote:
  1. Further, instead of "rolling our own" compressMessage, could we use the WebSocket per message-deflate extension? (https://datatracker.ietf.org/doc/html/rfc7692). Seems like it can seamlessly be integrated w/ ws: https://github.com/websockets/ws#websocket-compression. This leverages the Node/zlib library mentioned above out of the box.

Looks like this was addressed on Linear: https://linear.app/comm/issue/ENG-5018/compress-large-keyserver-client-payloads#comment-77a4df74

  1. It looks like Brotli is included in Zlib in (at least) Node >= 18 LTS: https://nodejs.org/dist/latest-v18.x/docs/api/zlib.html#class-brotlioptions. Would it be possible to leverage this instead of brotli.js (which doesn't seem to have a "real" commit since ~2016)?

Good call. Made this update!

  1. Further, instead of "rolling our own" compressMessage, could we use the WebSocket per message-deflate extension? (https://datatracker.ietf.org/doc/html/rfc7692). Seems like it can seamlessly be integrated w/ ws: https://github.com/websockets/ws#websocket-compression. This leverages the Node/zlib library mentioned above out of the box.

As you pointed out, this is addressed on Linear here.

  1. For non-WebSocket keyserver endpoints, we could use eg Express compression middleware (https://expressjs.com/en/resources/middleware/compression.html) instead of rolling something like compressMessage on our own and applying it on an endpoint by endpoint basis.

I'll make sure to explore this when looking at compressing eg. get_initial_redux_state and log_in. I agree that applying it on an endpoint-by-endpoint basis doesn't seem ideal.

  1. From the "Express compression middleware" docs:

For a high-traffic website in production, the best way to put compression in place is to implement it at a reverse proxy level (see Use a reverse proxy). In that case, you do not need to use compression middleware. For details on enabling gzip compression in Nginx, see Module ngx_http_gzip_module in the Nginx documentation.

We don't use Nginx, but Apache has a mod_brotli module that we could use to achieve the same thing: https://httpd.apache.org/docs/2.4/mod/mod_brotli.html?

I don't love this solution. I'd rather keep the reverse proxy layer minimally complex. Long-term the keyserver should not need a reverse proxy (should connect directly to Tunnelbroker).

In D9294#273177, @kamil wrote:

I was testing compression while trying to reduce report sizes, but I used different packages ENG-3890 and failed to find something working well on reports, this one seems promising, especially on native we use only decompression which is less CPU consuming than compression, but you can you somehow test this with large payloads to make sure it will not kill the app performance for the time of decompressing?

Good thinking... I'll do some additional testing and add it to the Test Plan before landing.

native/utils/decompress.js
11–14 ↗(On Diff #31417)

I don't believe so. I experimented with this a bit and wasn't able to find a better solution. I spent some time trying to do it without base64 since that would significantly reduce the size of the result, but I couldn't get it to work. Feel free to take a stab at it yourself

PS – it might be 3 and not 4 copies. decompressed is a Uint8Array, and passing that to Buffer.from might just create a "wrapper" around the original data rather than copying (but not 100% sure)

Remove brotli.js from keyserver in favor of built-in Node.js utility from zlib library

keyserver/src/utils/compress.js
7–11

Docs here

19

Node.js's version of Brotli doesn't have a "minimum size" like Brotli.js, so I had to come up with a minimum. I got this from here

22

Docs here

native/utils/decompress.js
4

brotli.js relies on Node.js's Buffer. Recent versions of React Native include this spec-compliant implementation of it

web/.eslintrc.json
8

Buffer is from Node.js and does not appear on the web. However we polyfill it here

web/utils/decompress.js
1–16

Identical to native, except there's no need to import Buffer

keyserver/src/utils/compress.js
22

I update to using the async version of this in the next diff: D9298

Looks good, thanks for addressing feedback

ashoat edited the test plan for this revision. (Show Details)