Page MenuHomePhabricator

[services] Backup - Fix copying big data chunk
ClosedPublic

Authored by karol on Mar 22 2022, 1:23 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Jun 30, 12:07 PM
Unknown Object (File)
Sun, Jun 30, 12:07 PM
Unknown Object (File)
Sun, Jun 30, 12:07 PM
Unknown Object (File)
Sun, Jun 30, 12:07 PM
Unknown Object (File)
Sun, Jun 30, 12:07 PM
Unknown Object (File)
Sun, Jun 30, 12:07 PM
Unknown Object (File)
Sun, Jun 30, 12:07 PM
Unknown Object (File)
Sun, Jun 30, 12:04 PM

Details

Summary

Depends on D3471

Once we receive data as a gRPC field, we don't want to copy it around, especially when it's a data chunk and the size of it may reach up to 4-5MB. Instead, it's always better to have just one place in memory for it and pass the address around.

Test Plan
cd services
yarn run-backup-service

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

karol retitled this revision from [services] Backup - fix copying big data chunk to [draft] [services] Backup - fix copying big data chunk.Mar 22 2022, 1:25 AM
karol edited the summary of this revision. (Show Details)
karol edited the test plan for this revision. (Show Details)
karol added reviewers: max, varun, tomek, jim.
karol retitled this revision from [draft] [services] Backup - fix copying big data chunk to [services] Backup - Fix copying big data chunk.
tomek requested changes to this revision.Mar 30 2022, 4:36 AM
tomek added inline comments.
services/backup/docker-server/contents/server/src/Reactors/client/blob/BlobPutClientReactor.h
53–54 ↗(On Diff #10761)

Are you sure this is a good idea? We take this by reference and then move - does it destroy the original data?

This revision now requires changes to proceed.Mar 30 2022, 4:36 AM
karol added inline comments.
services/backup/docker-server/contents/server/src/Reactors/client/blob/BlobPutClientReactor.h
53–54 ↗(On Diff #10761)

Yes. I know this looks dangerous, we should use this being aware that the original data will no longer be accessible. I thought it would crash but I ran it and just had an empty string after something like this:

string *str = response.datachunk();
scheduleSendingDataChunk(str);
court << str->size << endl;

Anyway, even though it didn't crash for me, I think we should assume a crash will occur.
For now, I don't see a better way of doing this. I think if we use a unique pointer, we will have the same situation.
The goal here is to prevent copying a big amount of data around.
Even if someone mistakenly uses this value after a function call, they will quickly see a problem there - for sure we should cover this in the tests.
We can add an informative comment there BTW.

What you think?

ashoat requested changes to this revision.Apr 1 2022, 12:09 PM
ashoat added inline comments.
services/backup/docker-server/contents/server/src/Reactors/client/blob/BlobPutClientReactor.h
53–54 ↗(On Diff #10761)

Why would we have the same issue with a unique_ptr? This is basically exactly what unique_ptr is for. When you pass in a unique_ptr to a function, you are saying "this thing is now yours, I no longer can access it" and the compiler will enforce that.

When you pass a reference like you are doing, the caller has no reason to expect that the data will destroyed. When you pass a unique_ptr, the caller knows they cannot access this data ever again.

This revision now requires changes to proceed.Apr 1 2022, 12:09 PM
services/backup/docker-server/contents/server/src/Reactors/client/blob/BlobPutClientReactor.h
53–54 ↗(On Diff #10761)

I mean, yes, you're right about the idea behind the unique pointer but you don't seem to be right that the compiler's going to be helpful here. Something like this crashes:

#include <iostream>
#include <memory>

using namespace std;

void f(unique_ptr<string> strp) {
  cout << "f " << *strp << endl;
}

int main()
{
  unique_ptr<string> str = make_unique<string>("asdasd");
  f(move(str));
  cout << "is str nullptr? " << (str == nullptr) << endl; // true
  cout << "out " << *str << endl; // crash

  return 0;
}

So, after a move, unique_ptr holds a nullptr. We can perform a check against the nullptr but if we don't and just try to access the pointer right away and it is null, then it's going to crash.

But I agree that passing a unique pointer here is good because it indicates that the data's not going to be available anymore.

use unique_ptr instead of raw pointers

It looks like there's a build error caused by some other diff

services/backup/docker-server/contents/server/src/Reactors/client/blob/BlobPutClientReactor.h
53–54 ↗(On Diff #10761)

Plus, we definitely should avoid using raw pointers as much as possible, so this should be changed.

Ah yeah, looks like you're right that the compiler doesn't enforce it. That's unfortunate, but I agree this is still more readable and clear

This revision is now accepted and ready to land.Apr 5 2022, 1:10 AM
In D3487#98630, @palys-swm wrote:

It looks like there's a build error caused by some other diff

This is the error I already fixed on master so I'm going to retrigger CI, should be ok!