Page MenuHomePhabricator

[services] Backup - Pull Backup - Wait for the get reactor to terminate
ClosedPublic

Authored by karol on Jun 23 2022, 4:13 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Nov 10, 12:33 PM
Unknown Object (File)
Sat, Nov 9, 4:04 AM
Unknown Object (File)
Thu, Oct 24, 5:28 PM
Unknown Object (File)
Tue, Oct 22, 2:57 PM
Unknown Object (File)
Mon, Oct 21, 4:46 PM
Unknown Object (File)
Sun, Oct 20, 4:08 AM
Unknown Object (File)
Fri, Oct 18, 9:43 PM
Unknown Object (File)
Fri, Oct 18, 9:43 PM

Details

Summary

Depends on D4326

This is a bug fix which was spotted in https://phab.comm.dev/D4228#121209

The problem was, we didn't wait for the getReactor to gracefully terminate in the PullBackupReactor. The result was, if the client stopped working, the whole service crashed. Now it just shows an error and all objects are deleted properly.

Test Plan

Start pulling the backup and crash the client (you can probably use ctrl+c but I just used panic in rust) in the middle of receiving compaction chunks.

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

karol edited the test plan for this revision. (Show Details)
karol added a reviewer: tomek.
tomek requested changes to this revision.Jun 27 2022, 5:52 AM
tomek added inline comments.
services/backup/src/Reactors/server/PullBackupReactor.cpp
183–185 ↗(On Diff #13709)

Shouldn't we use while here or a second variant of wait- void wait( std::unique_lock<std::mutex>& lock, Predicate stop_waiting ); https://en.cppreference.com/w/cpp/thread/condition_variable/wait to protect against a spurious wakeup?

This revision now requires changes to proceed.Jun 27 2022, 5:52 AM
services/backup/src/Reactors/server/PullBackupReactor.cpp
183–185 ↗(On Diff #13709)

Discussed offline. We decided the best way to go will be to leave this as is but after a wake-up here we should check if the state is really DONE, because if it is not, then we should definitely throw an error.

The thing is, people usually use a while loop for this:

while (cond) { cv.wait(lock); }

but I think we shouldn't do that because we call notify_one only once in the done callback. So if it's called twice (not sure how that could happen) or it is called before the state is set to DONE (this either) then it means that something went terribly wrong and we should definitely throw.

This revision is now accepted and ready to land.Jul 7 2022, 3:46 AM
This revision was landed with ongoing or failed builds.Jul 8 2022, 2:31 AM
This revision was automatically updated to reflect the committed changes.