Page MenuHomePhabricator

[services] Backup - Add blob client
ClosedPublic

Authored by karol on Mar 21 2022, 2:39 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Oct 13, 7:34 PM
Unknown Object (File)
Sun, Oct 13, 7:34 PM
Unknown Object (File)
Sun, Oct 13, 7:34 PM
Unknown Object (File)
Sun, Oct 13, 7:34 PM
Unknown Object (File)
Sun, Oct 13, 7:34 PM
Unknown Object (File)
Sun, Oct 13, 7:34 PM
Unknown Object (File)
Sun, Oct 13, 7:33 PM
Unknown Object (File)
Sun, Oct 13, 7:30 PM

Details

Summary

Depends on D3515

Adding blob client to the backup service

Test Plan
cd services
yarn run-backup-service

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Mar 21 2022, 2:54 AM
Harbormaster failed remote builds in B7487: Diff 10542!
karol retitled this revision from [services] Backup - Add blob client to [draft] [services] Backup - Add blob client.Mar 21 2022, 3:08 AM
karol edited the summary of this revision. (Show Details)
karol edited the test plan for this revision. (Show Details)
karol added reviewers: max, tomek, varun, jim.
services/backup/docker-server/contents/server/src/Reactors/client/blob/ServiceBlobClient.h
29 ↗(On Diff #10552)
tomek requested changes to this revision.Mar 24 2022, 7:38 AM

This code looks more or less ok, but please improve it by making the log statements useful and by deleting commented out code

services/backup/docker-server/contents/server/src/Reactors/client/blob/BlobPutClientReactor.h
52 ↗(On Diff #10552)

What is the reason behind choosing this value?

59 ↗(On Diff #10552)

What is BC? We should choose meaningful names, especially when logging

93 ↗(On Diff #10552)

This comment doesn't explain why existing data is an issue so important that we have to abort the procedure

services/backup/docker-server/contents/server/src/Reactors/client/blob/ServiceBlobClient.h
29 ↗(On Diff #10552)

It might be a good idea to have more descriptive comments saying what is the issue and where it is tracked.

This revision now requires changes to proceed.Mar 24 2022, 7:38 AM

back to ur q

services/backup/docker-server/contents/server/src/Reactors/client/blob/BlobPutClientReactor.h
52 ↗(On Diff #10552)

No particular reason for this. In another place in our repo, we used 100, how about this?

I think 100 should suffice. Basically, the limit here should be big enough to store all the data chunks that are being queued and are waiting for processing. 100 is a lot for something like this.
Assuming we'll have really tons of data chunks: every data chunk is going to be ~4MB(grpc limit AFAIR).
Here's a communication model for this:
client => backup => blob
So we'd need to be sending far more chunks from the client to the backup than from the backup to the blob at the same time to have this queue really growing.

What do you think?

59 ↗(On Diff #10552)

This is a draft, please ignore all logs and comments, they are for me and are going to be deleted.

services/backup/docker-server/contents/server/src/Reactors/client/blob/ServiceBlobClient.h
29 ↗(On Diff #10552)

This is a draft, please ignore all logs and comments, they are for me and are going to be deleted.

tomek requested changes to this revision.Mar 25 2022, 8:27 AM
tomek added inline comments.
services/backup/docker-server/contents/server/src/Reactors/client/blob/BlobPutClientReactor.h
52 ↗(On Diff #10552)

What is this data chunk? Can this be e.g. a backup compaction? If that is the case, this compaction could take gigabytes, so this limit sounds like an issue.

But overall, I think that we should avoid having any arbitrary limitations on sizes. Can we find a solution that is unbounded? Maybe we can create a queue when we know what its size is?

It might be also the case that I don't understand what dataChunks mean

This revision now requires changes to proceed.Mar 25 2022, 8:27 AM

back to you

services/backup/docker-server/contents/server/src/Reactors/client/blob/BlobPutClientReactor.h
52 ↗(On Diff #10552)

dataChunk is a piece of data, we have to use chunked data because of the grpc limitations - we can at once send up to 4MBof data (this is a constraint that can be changed but we've never done it so far and unless it is really needed I don't see a strong reason to do that. If we one day decide to do it, we'll probably just need to set a new value in a couple of places).

So if we have 1GB of compaction let's say, then we slice it into parts of 4MB and each of these slices is a data chunk.

Can we find a solution that is unbounded?

I'm not sure. We already use MPMCQueue in the WorkerThread and the limit was introduced in https://phabricator.ashoat.com/D1137, later changed in https://phabricator.ashoat.com/D2015 from 20 to 100.

My bet is that if we use this collection with the hardcoded capacity already, then we've probably already checked it and it is not possible to have an unbounded queue. But if you insist I can re-check this.

I quickly checked and MPMCQueue takes Dynamic as a 3rd template argument. The number we pass to the constructor as a capacity specifies how much memory will be allocated up-front. From folly's code:

/// The queue has a fixed capacity, for which all memory will be allocated
/// up front.

also

/// The dynamic version of MPMCQueue allows dynamic expansion of queue
/// capacity, such that a queue may start with a smaller capacity than
/// specified and expand only if needed. Users may optionally specify
/// the initial capacity and the expansion multiplier.

On the other hand

/// *** The dynamic version of MPMCQueue is deprecated. ***
/// Use UnboundedQueue instead.

https://github.com/facebook/folly/blob/18eb46b2a0f03abc2f5d2b673f41b844dc253590/folly/MPMCQueue.h#L118

So I think we have a couple of options here:

  1. stay with the bounded MPMCQueue
  2. use deprecated dynamic MPMCQueue
  3. use UnboundedQueue as suggested by folly's authors

I think option 2 is bad, I'd do 1 or 3. My suggestion is the following: Let's stay with option 1 in this diff with the capacity of 100 which is plenty. I'm going to create a task for this, and we can try using UnboundedQueue. This makes more sense to me, because in this case, we may want to replace using bounded MPMCQueue in the WorkerThread as well.

What do you think?

Looks ok as a draft. Going to make a deeper review when comments and logs are cleaned

services/backup/docker-server/contents/server/src/Reactors/client/blob/BlobPutClientReactor.h
52 ↗(On Diff #10552)

Ok, we can start with 1 and introduce 3 as a separate task

This revision is now accepted and ready to land.Mar 28 2022, 5:21 AM
services/backup/docker-server/contents/server/src/Reactors/client/blob/BlobPutClientReactor.h
52 ↗(On Diff #10552)
karol retitled this revision from [draft] [services] Backup - Add blob client to [services] Backup - Add blob client.

removed "draft"

tomek added inline comments.
services/backup/docker-server/contents/server/src/Reactors/client/blob/ServiceBlobClient.h
46–47 ↗(On Diff #10750)

This is not a draft so we should clean up the comments

This revision is now accepted and ready to land.Mar 30 2022, 3:39 AM
services/backup/docker-server/contents/server/src/Reactors/client/blob/ServiceBlobClient.h
46–47 ↗(On Diff #10750)

I know, but here I did it on purpose, these indicate what else can be implemented.
If you want I can remove them but I see no harm.

This revision was landed with ongoing or failed builds.Mar 31 2022, 7:21 AM
This revision was automatically updated to reflect the committed changes.