[services][backup] CreateBackup 1/3 - create handler module
ClosedPublic
Actions

Authored by bartek on Jan 9 2023, 11:54 AM.

Details

Reviewers

varun
tomek
• jon
• max

Commits

rCOMM74955fedf8ff: [services][backup] CreateBackup 1/3 - create handler module

Summary

Created module structure for gRPC service handlers to keep it organized:

- mod service (existing grpc service mod)
  - mod handlers (groups all handler submodules)
    - create_backup
    - add_attachments
    - ...

Scaffolded the CreateBackupHandler structure, containing the whole endpoint logic. This is 1:1 analogy with Blob service PutHandler
Implemented usage of this structure in the endpoint handler function.

The CreateBackupHandler consists of two phases

Non-data mode - processing non-chunk inputs like device_id, user_id etc.
Data mode - processing backup data chunks
Finish - postprocessing, saving to db etc.

Depends on D6181

Test Plan

This does nothing yet, subsequent diffs will add logic to this code.

Diff Detail

Repository

rCOMM Comm

Lint

No Lint Coverage

Unit

No Test Coverage

Event Timeline

bartek created this revision.Jan 9 2023, 11:54 AM

bartek held this revision as a draft.

Herald added subscribers: atul, ashoat. · View Herald TranscriptJan 9 2023, 11:54 AM

bartek added a child revision: D6197: [services][backup] Add helper utilities.Jan 9 2023, 12:00 PM

Harbormaster completed remote builds in B15157: Diff 20668.Jan 9 2023, 12:08 PM

bartek edited the summary of this revision. (Show Details)Jan 9 2023, 12:39 PM

bartek published this revision for review.Jan 9 2023, 1:35 PM

bartek added inline comments.

services/backup/src/service/mod.rs
53	This imports `UserId`, `DeviceId`, `KeyEntropy` etc. I use the local import as this `Data` has the same name, but is different for each endpoint.

Rebase

Harbormaster completed remote builds in B15192: Diff 20704.Jan 9 2023, 3:36 PM

• jon added inline comments.Jan 10 2023, 10:45 PM

services/backup/src/service/mod.rs
61–82 ↗	(On Diff #20704)	I think we should revisit data modeling for backup https://linear.app/comm/issue/ENG-1052. Having to piece together each individual field is likely a smell; a lot of these fields are related, and I don't think it makes sense to treat them as individual messages. For example, `NewCompactionHash` and `NewCompactionChunk` both describe the same bit of information. Not to mentions there's a non-zero amount of overhead for serialization and deserialization of each message. Handling large files seems to be a non-goal of gRPC. Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy. We could look into using presigned urls and having the client upload the object directly.

100% agree we need to rethink those things, but also want to make sure we don't block the Rust refactor on it

The original design here is pretty questionable

As for transferring large files via gRPC, I would probably treat that as a separate task than the initial data model rethinking (which I think would be easier to address)

Probably a good argument here for doing this before we start actually using the service

In D6196#inline-41692, @jon wrote:

Having to piece together each individual field is likely a smell [...]

Agree, this was already discussed during the Blob service refactor.
A solution, that would significantly reduce the complexity of this code and make the API simpler is to at least group non-data inputs together, as I proposed here for Blob service: https://linear.app/comm/issue/ENG-937#comment-1a039b4a

In D6196#inline-41692, @jon wrote:

Handling large files seems to be a non-goal of gRPC.

This should be raised long ago during Blob service design. Dealing with large data through gRPC is possible, but cumbersome, because e.g. in the PullBackup endpoint we need to mix business and transport logic by counting individual message field bytes, then buffering and shrinking data bytes length accordingly and I hate it (example).

In D6196#inline-41692, @jon wrote:

We could look into using presigned urls and having the client upload the object directly.

I'm open and enthusiastic to this solution, but let's create a Linear task to discuss this further. I've already worked with presigned URLs - this is how Expo's EAS builds and submissions are stored.

In D6196#186463, @ashoat wrote:

100% agree we need to rethink those things, but also want to make sure we don't block the Rust refactor on it

Right, changing the API design isn't a part of this task. One advantage is that the Rust service should be much easier to adapt to API changes than the old C++ one.

tomek accepted this revision.Jan 11 2023, 12:27 PM

tomek added inline comments.

services/backup/src/service/handlers/create_backup.rs
22 ↗	(On Diff #20704)	Usually it is more maintainable to use enum state instead of a boolean. It makes a difference when we decide to add a second flag.
services/backup/src/service/mod.rs
21–22 ↗	(On Diff #20704)	Maybe I'm missing something but does `pub(self)` make a difference? https://github.com/rust-lang/rfcs/blob/master/text/1422-pub-restricted.md#semantics As noted above, the definition means that pub(self) item is the same as if one had written just item.
55 ↗	(On Diff #20704)	Is it a good idea to log the whole request? Can that result in logging binary data?

bartek added inline comments.Jan 11 2023, 12:35 PM

services/backup/src/service/handlers/create_backup.rs
22 ↗	(On Diff #20704)	I have no strong opinion on this, I can refactor to enum
services/backup/src/service/mod.rs
21–22 ↗	(On Diff #20704)	No difference, I just wanted to be explicit

bartek added inline comments.Jan 11 2023, 12:46 PM

services/backup/src/service/mod.rs
55 ↗	(On Diff #20704)	In case of stream requests, it won't print any actual inputs, but `message: Streaming`: CreateNewBackup request: Request { metadata: MetadataMap { headers: {"te": "trailers", "content-type": "application/grpc", "user-agent": "tonic/0.8.3"} }, message: Streaming, extensions: Extensions } In fact, I'm wondering if it's worth printing this object - there isn't much useful data here. The main reason I put this log is to inform that a new request has just started processing.