Page MenuHomePhabricator

[services][backup] CreateBackup 1/3 - create handler module
ClosedPublic

Authored by bartek on Jan 9 2023, 3:54 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Dec 18, 11:04 PM
Unknown Object (File)
Wed, Dec 18, 10:54 PM
Unknown Object (File)
Wed, Dec 18, 10:32 PM
Unknown Object (File)
Wed, Dec 18, 7:52 PM
Unknown Object (File)
Sat, Dec 14, 7:18 AM
Unknown Object (File)
Sat, Nov 30, 1:01 AM
Unknown Object (File)
Sat, Nov 30, 1:01 AM
Unknown Object (File)
Sat, Nov 30, 1:00 AM
Subscribers

Details

Summary
  • Created module structure for gRPC service handlers to keep it organized:
- mod service (existing grpc service mod)
  - mod handlers (groups all handler submodules)
    - create_backup
    - add_attachments
    - ...
  • Scaffolded the CreateBackupHandler structure, containing the whole endpoint logic. This is 1:1 analogy with Blob service PutHandler
  • Implemented usage of this structure in the endpoint handler function.

The CreateBackupHandler consists of two phases

  1. Non-data mode - processing non-chunk inputs like device_id, user_id etc.
  2. Data mode - processing backup data chunks
  3. Finish - postprocessing, saving to db etc.

Depends on D6181

Test Plan

This does nothing yet, subsequent diffs will add logic to this code.

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

bartek held this revision as a draft.
bartek published this revision for review.Jan 9 2023, 5:35 AM
bartek added inline comments.
services/backup/src/service/mod.rs
53

This imports UserId, DeviceId, KeyEntropy etc.
I use the local import as this Data has the same name, but is different for each endpoint.

services/backup/src/service/mod.rs
61–82 ↗(On Diff #20704)

I think we should revisit data modeling for backup https://linear.app/comm/issue/ENG-1052.

Having to piece together each individual field is likely a smell; a lot of these fields are related, and I don't think it makes sense to treat them as individual messages. For example, NewCompactionHash and NewCompactionChunk both describe the same bit of information.

Not to mentions there's a non-zero amount of overhead for serialization and deserialization of each message.

Handling large files seems to be a non-goal of gRPC.

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

We could look into using presigned urls and having the client upload the object directly.

100% agree we need to rethink those things, but also want to make sure we don't block the Rust refactor on it

The original design here is pretty questionable

As for transferring large files via gRPC, I would probably treat that as a separate task than the initial data model rethinking (which I think would be easier to address)

Probably a good argument here for doing this before we start actually using the service

Having to piece together each individual field is likely a smell [...]

Agree, this was already discussed during the Blob service refactor.
A solution, that would significantly reduce the complexity of this code and make the API simpler is to at least group non-data inputs together, as I proposed here for Blob service: https://linear.app/comm/issue/ENG-937#comment-1a039b4a

Handling large files seems to be a non-goal of gRPC.

This should be raised long ago during Blob service design. Dealing with large data through gRPC is possible, but cumbersome, because e.g. in the PullBackup endpoint we need to mix business and transport logic by counting individual message field bytes, then buffering and shrinking data bytes length accordingly and I hate it (example).

We could look into using presigned urls and having the client upload the object directly.

I'm open and enthusiastic to this solution, but let's create a Linear task to discuss this further. I've already worked with presigned URLs - this is how Expo's EAS builds and submissions are stored.

100% agree we need to rethink those things, but also want to make sure we don't block the Rust refactor on it

Right, changing the API design isn't a part of this task. One advantage is that the Rust service should be much easier to adapt to API changes than the old C++ one.

tomek added inline comments.
services/backup/src/service/handlers/create_backup.rs
22 ↗(On Diff #20704)

Usually it is more maintainable to use enum state instead of a boolean. It makes a difference when we decide to add a second flag.

services/backup/src/service/mod.rs
21–22 ↗(On Diff #20704)

Maybe I'm missing something but does pub(self) make a difference?
https://github.com/rust-lang/rfcs/blob/master/text/1422-pub-restricted.md#semantics

As noted above, the definition means that pub(self) item is the same as if one had written just item.

55 ↗(On Diff #20704)

Is it a good idea to log the whole request? Can that result in logging binary data?

services/backup/src/service/handlers/create_backup.rs
22 ↗(On Diff #20704)

I have no strong opinion on this, I can refactor to enum

services/backup/src/service/mod.rs
21–22 ↗(On Diff #20704)

No difference, I just wanted to be explicit

services/backup/src/service/mod.rs
55 ↗(On Diff #20704)

In case of stream requests, it won't print any actual inputs, but message: Streaming:

CreateNewBackup request: Request { metadata: MetadataMap { headers: {"te": "trailers", "content-type": "application/grpc", "user-agent": "tonic/0.8.3"} }, message: Streaming, extensions: Extensions }

In fact, I'm wondering if it's worth printing this object - there isn't much useful data here. The main reason I put this log is to inform that a new request has just started processing.

This revision is now accepted and ready to land.Jan 11 2023, 9:20 PM
  • Refactored is_data_mode boolean to HandlerState enum
  • Removed user_id field from logs