Page MenuHomePhabricator

[services][backup] PullBackup 5/5 - blob download stream
ClosedPublic

Authored by bartek on Jan 11 2023, 3:48 PM.
Tags
None
Referenced Files
F3491606: D6246.diff
Wed, Dec 18, 7:48 PM
Unknown Object (File)
Thu, Nov 28, 4:30 AM
Unknown Object (File)
Thu, Nov 28, 4:30 AM
Unknown Object (File)
Thu, Nov 28, 4:30 AM
Unknown Object (File)
Thu, Nov 28, 4:30 AM
Unknown Object (File)
Thu, Nov 28, 4:30 AM
Unknown Object (File)
Nov 10 2024, 10:05 PM
Unknown Object (File)
Nov 9 2024, 3:04 PM
Subscribers

Details

Summary

This diff leverages both ResponseBuffer and BlobStoredItem to create a generic stream that downloads the item from blob service, buffers the data and sends in chunks not exceeding the gRPC message size. A few notes:

  • First message includes "additional extra info" - fields like attachment holders etc. Rest receive only ID and data chunk
  • The data is buffered before sending, this is the solution 2 from https://phab.comm.dev/D4439#126984
  • More data is not downloaded when the buffer is already filled enough
  • Stream ends when there is no more data

The order of instructions is important here:

  1. Push new data if not full
  2. Finish if empty
  3. Pop if not empty

Depends on D6243

Test Plan

At this point, the whole service can be tested with integration tests

cd services
yarn reset-local-cloud
nix run .#comm-blob
RUST_LOG=backup=trace cargo run -- --port 50052 --sandbox --blob-service-url "http://localhost:50051"
yarn run-integration-tests backup

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

bartek held this revision as a draft.
bartek published this revision for review.Jan 11 2023, 3:57 PM

Looks great!

services/backup/src/service/handlers/pull_backup.rs
128 ↗(On Diff #20830)

This API looks unintuitive: it seems like we're taking extra_bytes bytes from the buffer. Can we modify the API to be less confusing?

135 ↗(On Diff #20830)

We can make it slightly more efficient by waiting with sending for the buffer to get saturated or the client to become wholly read. But I don't think it is worth it.

This revision is now accepted and ready to land.Jan 12 2023, 9:32 AM
services/backup/src/service/handlers/pull_backup.rs
128 ↗(On Diff #20830)

maybe

let padding = item.metadata_size(is_first_chunk);
let chunk = buffer.get_chunk(padding);

?

135 ↗(On Diff #20830)

I don't think so. Response data chunks are slightly smaller than those written to the buffer - it gets saturated/desaturated nearly every time, so one write + one read per iteration is a good balance.

Batching it would block either blob service or backup client waiting for each other finishes sending/receiving

services/backup/src/service/handlers/pull_backup.rs
128 ↗(On Diff #20830)

Another idea

Minor renames to make the API more intuitive

tomek added inline comments.
services/backup/src/service/handlers/pull_backup.rs
128 ↗(On Diff #20830)

metadata_size is a lot better name - thanks! As for get_with_padding it is still a bit confusing: someone can think that a padding is added to a result (so size + padding bytes are returned in total). So for me it's better to keep it as is.