Details

Reviewers

• max
varun
• jim
tomek

Commits

rCOMMc8039f5647ef: [services] Backup - make blob client instances separate for every thread

Summary

Depends on D3487

https://linear.app/comm/issue/ENG-905/fix-the-threading-problem - the problem and approaches are described in the task

Test Plan

You need 3 terminals
1:

cd services
yarn run-backup-service

cd services
yarn run-blob-service

3:
You can use https://github.com/karol-bisztyga/grpc-playground/tree/backup-async

./build.sh
./cmake/build/bin/client

then in the GUI use the n option (just type n and hit enter, this stands for creating a new backup).

Diff Detail

Repository

rCOMM Comm

Branch

backup-phabricator

Lint

No Lint Coverage

Unit

No Test Coverage

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Herald added subscribers: • benschac, atul, • adrian and 2 others. · View Herald TranscriptMar 22 2022, 9:47 AM

Harbormaster completed remote builds in B7524: Diff 10590.Mar 22 2022, 9:51 AM

• karol requested review of this revision.Mar 22 2022, 9:51 AM

• karol retitled this revision from [services] Backup - make blob client instances separate for every thread to [draft] [services] Backup - make blob client instances separate for every thread.Mar 22 2022, 9:53 AM

• karol edited the summary of this revision. (Show Details)

• karol edited the test plan for this revision. (Show Details)

• karol added reviewers: • max, tomek, varun, • jim.

• karol added a parent revision: D3487: [services] Backup - Fix copying big data chunk.

• karol mentioned this in D3467: [services] Backup - Add server reactor implementations - create new backup reactor.Mar 25 2022, 3:22 PM

• karol planned changes to this revision.Mar 28 2022, 10:43 AM

update

Harbormaster failed remote builds in B7629: Diff 10717!Mar 28 2022, 10:57 AM

rebuild

Harbormaster failed remote builds in B7642: Diff 10730!Mar 28 2022, 11:50 AM

build fix needed

fix build

Harbormaster completed remote builds in B7644: Diff 10732.Mar 28 2022, 1:01 PM

• karol planned changes to this revision.Mar 29 2022, 8:40 AM

rebase

Harbormaster completed remote builds in B7661: Diff 10756.Mar 29 2022, 8:51 AM

• karol added a child revision: D3529: [services] Reactors - use doneCallback only from OnDone.Mar 29 2022, 9:10 AM

• karol planned changes to this revision.Mar 29 2022, 9:58 AM

update

Harbormaster completed remote builds in B7666: Diff 10762.Mar 29 2022, 11:44 AM

• karol planned changes to this revision.Mar 30 2022, 6:58 AM

• karol added inline comments.

services/backup/docker-server/contents/server/src/Reactors/client/blob/ServiceBlobClient.h
30 ↗	(On Diff #10762)	remove this

remove log

Harbormaster completed remote builds in B7704: Diff 10806.Mar 30 2022, 7:06 AM

Could you explain why do we need to use this synchronization?

Is there a way to avoid the synchronization?

What are the consequences of using synchronization for handleRequest method? Can we handle a couple of connections at the same time?

edit:
I just noticed that there's a lot of description in the issue. Going to read through that before requesting changes,

This revision now requires changes to proceed.Mar 31 2022, 4:37 PM

tomek removed a reviewer: tomek.Mar 31 2022, 4:39 PM

This revision now requires review to proceed.Mar 31 2022, 4:39 PM

tomek added a reviewer: tomek.Mar 31 2022, 4:39 PM

• karol retitled this revision from [draft] [services] Backup - make blob client instances separate for every thread to [services] Backup - make blob client instances separate for every thread.Apr 1 2022, 12:02 PM

rebase needed

rebase

Harbormaster failed remote builds in B7816: Diff 10983!Apr 4 2022, 10:21 AM

retrigger CI

Harbormaster completed remote builds in B7860: Diff 11034.Apr 5 2022, 8:33 AM

Before analyzing if this code handles synchronization correctly, I'd like to check if it is really the best option.

In the task you described a possible solution:

store them as reactor's members - we could store the client reactor objects as members of CreateNewBackupReactor since we need 1 client reactor per 1 server reactor (because every backup server reactor needs to call the blob service through the client reactor just once). I tried to do it this way but encountered problems. The main problem is (I think) that the client reactor objects are being destroyed while gRPC still tries to perform some operations on them.

Is it possible to fix the main problem by postponing object deletion until the communication is done, e.g. by replacing delete this with setting a callback in which the object is deleted and giving that callback to the other reactor?

Overall, it's not obvious for me why do we need complicated synchronization here.

This revision now requires changes to proceed.Apr 7 2022, 6:28 AM

• karol mentioned this in D3485: [services] Backup - Remove backup base Docker image.Apr 7 2022, 6:57 AM

Is it possible to fix the main problem by postponing object deletion until the communication is done, e.g. by replacing delete this with setting a callback in which the object is deleted and giving that callback to the other reactor?

The solution you proposed seems to be very hacky. First, delete this lies in the base reactor's code. Second, I don't understand what is "other reactor" here? Did you mean putReactor?

Overall, it's not obvious for me why do we need complicated synchronization here.

Not sure what is unclear here. We have a server reactor that creates a client reactor inside. We need to wait for the client reactor to complete all operations before we terminate the (parent) server reactor.

• karol planned changes to this revision.Apr 7 2022, 8:12 AM

• karol added inline comments.

services/backup/docker-server/contents/server/src/Reactors/server/CreateNewBackupReactor.h
34 ↗	(On Diff #11034)	rename

rename mutex

Harbormaster completed remote builds in B7947: Diff 11136.Apr 7 2022, 8:19 AM

We've discussed this offline with @karol-bisztyga. It looks like the main issue is that we need to call one reactor from another, but in our current solution when the caller is done it deletes itself which results in callee being deleted before it can finish its work. We've considered a couple of other options not involving using mutexes, one of them is promising, but it needs to be verified. Even if we agree to change the approach, there are a lot of diffs in this stack so it will be a lot cheaper to do such change in a followup.

Requesting changes because of inline comments, but we will probably keep the overall approach for now.

services/backup/docker-server/contents/server/src/Reactors/server/CreateNewBackupReactor.h
35–36 ↗	(On Diff #11136)	Can you rename it so that's obvious what it's doing? E.g. `blobDoneCV`. Also wondering if we can use higher level mechanism, e.g. `std::latch`
91–96 ↗	(On Diff #11136)	As discussed, we should try to verify if we can block in this method call

This revision now requires changes to proceed.Apr 7 2022, 8:28 AM

• karol added inline comments.Apr 7 2022, 8:48 AM

services/backup/docker-server/contents/server/src/Reactors/server/CreateNewBackupReactor.h
35–36 ↗	(On Diff #11136)	Can you rename it so that's obvious what it's doing? E.g. `blobDoneCV`. sure Also wondering if we can use higher level mechanism, e.g. `std::latch` I think we can. The only difference is that instead of using mutex + CV we just use a single latch, right? I think we could cover this in a follow-up, it will be probably easier/faster that way.
91–96 ↗	(On Diff #11136)	I tried to look at the source code + google a bit but I've not found anything... As I think about this, I cannot see why there could be something wrong with this. All grpc operations are done, so at this point, it's just a regular object hanging for a bit to commit the suicide. If your intuition still keeps telling you that it can be dangerous for some reason, we can create a task and follow up there trying to squeeze some information about this out of the web.

rename

Harbormaster completed remote builds in B7951: Diff 11140.Apr 7 2022, 8:56 AM

Accepting, but please create a task where we can discuss about the alternatives.

services/backup/docker-server/contents/server/src/Reactors/server/CreateNewBackupReactor.h
36 ↗	(On Diff #11140)	We can skip `CV` in the mutex name
93 ↗	(On Diff #11140)	Could you remind me why we're scheduling sending empty string here?
35–36 ↗	(On Diff #11136)	The only difference is that instead of using mutex + CV we just use a single latch, right? Yes I think we could cover this in a follow-up, it will be probably easier/faster that way. I don't think it will be hard to update in this diff, but adding it as a followup also doesn't hurt - so up to you.
91–96 ↗	(On Diff #11136)	I don't expect that we will find the answers easily. I think it would be a lot better to invest in checking if the other solution might work and only if that's not the case, we should continue the verification here.

This revision is now accepted and ready to land.Apr 7 2022, 8:57 AM

https://linear.app/comm/issue/ENG-979/think-of-different-approaches-to-clientserver-multilayer-problem-in

services/backup/docker-server/contents/server/src/Reactors/server/CreateNewBackupReactor.h
36 ↗	(On Diff #11140)	For me, it's more clear because this CV and mutex are tightly bonded so having a name that's a subset of another name makes sense. If you feel strongly, I can change but I'd leave it as is.
93 ↗	(On Diff #11140)	To let the client reactor know that there are no more chunks. We operate on the concurrent queue here, so we have to enqueue something that will indicate we're finished, otherwise, the queue read on the other side is going to hang forever. An alternative here is to have `queue<unique_ptr<string>>` instead of `queue<string>` so we can enqueue a `nullptr`. I don't personally mind but I can understand that an empty string may not be crystal clear (although it is also logical since we wouldn't have an empty chunk - that would be pointless).
35–36 ↗	(On Diff #11136)	https://linear.app/comm/issue/ENG-980/replace-cond-variablemutex-witch-latch

Ok, I'm landing this now, @palys-swm if you feel like something here is unfinished, please let me know.

Closed by commit rCOMMc8039f5647ef: [services] Backup - make blob client instances separate for every thread. · Explain WhyApr 7 2022, 9:36 AM

This revision was automatically updated to reflect the committed changes.

• karol added a commit: rCOMMc8039f5647ef: [services] Backup - make blob client instances separate for every thread.

tomek mentioned this in D3646: [services] Backup - Send log - initialize put reactor method.Apr 8 2022, 8:16 AM

[services] Backup - make blob client instances separate for every thread
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 10590

services/backup/docker-server/contents/server/src/Reactors/client/base-reactors/ClientBidiReactorBase.h

services/backup/docker-server/contents/server/src/Reactors/client/blob/BlobPutClientReactor.h

services/backup/docker-server/contents/server/src/Reactors/client/blob/ServiceBlobClient.h

services/backup/docker-server/contents/server/src/Reactors/server/CreateNewBackupReactor.h

[services] Backup - make blob client instances separate for every threadClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 10590

services/backup/docker-server/contents/server/src/Reactors/client/base-reactors/ClientBidiReactorBase.h

services/backup/docker-server/contents/server/src/Reactors/client/blob/BlobPutClientReactor.h

services/backup/docker-server/contents/server/src/Reactors/client/blob/ServiceBlobClient.h

services/backup/docker-server/contents/server/src/Reactors/server/CreateNewBackupReactor.h

[services] Backup - make blob client instances separate for every thread
ClosedPublic
Actions

Revision Contents
Changeset List