Page MenuHomePhabricator

[services] Backup - Add helper method for extra data
ClosedPublic

Authored by karol on Jul 19 2022, 1:27 AM.
Tags
None
Referenced Files
Unknown Object (File)
Oct 26 2024, 5:06 AM
Unknown Object (File)
Oct 26 2024, 5:06 AM
Unknown Object (File)
Oct 26 2024, 5:06 AM
Unknown Object (File)
Oct 26 2024, 5:06 AM
Unknown Object (File)
Oct 26 2024, 5:06 AM
Unknown Object (File)
Oct 26 2024, 5:06 AM
Unknown Object (File)
Oct 26 2024, 5:06 AM
Unknown Object (File)
Oct 26 2024, 5:06 AM

Details

Summary

Depends on D4554

Adding helper function PrepareDataChunkWithExtraData that performs necessary calculations. Context is in point 2 of this comment: https://phab.comm.dev/D4439#126984

Test Plan

full test plan in D4556, for now, the service builds

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

tomek requested changes to this revision.Jul 21 2022, 2:50 AM
tomek added inline comments.
services/backup/src/Reactors/server/PullBackupReactor.cpp
180–182 ↗(On Diff #14617)

This name is quite misleading. When reading it, it gives an impression that the function will add some extra bytes. Instead the function only makes sure that chunk plus extra bytes doesn't exceed the limit. Maybe using a word padding would be better?

185–187 ↗(On Diff #14617)

It doesn't seem really beneficial, but when using a buffer we could handle chunks bigger than the limit. But currently it doesn't seem like doing that would make sense.

192 ↗(On Diff #14617)

Shouldn't we clear internalBuffer? Or moving it is enough to make sure that this->internalBuffer.size() would be 0 in the next iteration?

196–198 ↗(On Diff #14617)

Why do we compare it with limit / 2? Can we find a solution where this exception doesn't need to be thrown?

This revision now requires changes to proceed.Jul 21 2022, 2:50 AM
services/backup/src/Reactors/server/PullBackupReactor.cpp
180–182 ↗(On Diff #14617)

Ok, I don't mind naming that much.

185–187 ↗(On Diff #14617)

Maybe, but this situation should simply never occur and if this check passes, it means that something went really, really wrong.

192 ↗(On Diff #14617)

Moving suffices.
This code

// Example program
#include <iostream>
#include <string>

int main()
{
  std::string s1 = "aaa";
  std::string s2 = std::move(s1);
  std::cout << s1.size() << std::endl;
  std::cout << s2.size() << std::endl;
}

Gives output:

0
3
196–198 ↗(On Diff #14617)

I think we should check for the size and set the limit at some point. At most, it should be just chunkLimit - extraBytesNeeded(aka padding). This is because, if the internal buffer grows beyond the chunk limit, we'll not be able to send it in the end - in the end, we check if the internal buffer is empty and if it's not, we're sending one more message containing its contents.

In case we don't set an upper limit for this, there are 2 problems:

  • we'll have to implement even more complicated logic for sending more than one message in the end
  • we won't control the growth of the internal buffer, and what comes next, we'll not be able to know how much data we'll have to allocate for the internal buffer.

I'm going to change the limit from the chunkLimit/2 to chunkLimit - padding

One more thing: This is a sanity check, in the real world this should never be exceeded. Let's assume we have 10KB of the additional data (which is way too much for a couple of ids anyway). Then, with every message the internal buffer's going to grow by 10KB, which means it's going to reach this new limit after 399 messages:

4MB = 4 000 KB
limit = 4 000KB - 10KB = 3990KB
3990 / 10 = 399

That means, we'd need a blob that would take 399 * 4MB = 1596MB = 1,5GB+
Since 10KB for ids is way too much, let's go with 1KB as well:

limit = 4 000KB - 1KB = 3999KB
3999 / 1 = 3999
3999 * 4MB = 15996MB = 15GB

I don't think we're going to end up having a simple blob of 16GB, even 1.5GB. Please, keep in mind that we're talking about the blobs that are supposed to be fetched through the backup service. So that doesn't apply to pics/vids, just the database records since media's going to be added as attachment holders and fetched directly from the blob to the client, right (hope I didn't miss anything)?

karol edited the test plan for this revision. (Show Details)

rename as requested, increate internal buffer limit

tomek requested changes to this revision.Jul 22 2022, 1:55 AM
tomek added inline comments.
services/backup/src/Reactors/server/PullBackupReactor.cpp
180–182 ↗(On Diff #14617)

I was rather thinking about finding a better name for this whole method - can you spend some time searching for a better one?

196–198 ↗(On Diff #14617)

Introducing this arbitrary limit on blob size doesn't make any sense. Even if it's now unlikely that the blob would be that big, we don't gain anything from introducing this limit and changing it in the future would be more complicated that doing it right from the beginning. It's just a shortcut and we should avoid doing this. When an internal buffer grows up to some point, we should simply send its data - why can't we do something like this?

This revision now requires changes to proceed.Jul 22 2022, 1:55 AM
services/backup/src/Reactors/server/PullBackupReactor.cpp
180–182 ↗(On Diff #14617)

Ok, I'm going to change this name to PrepareDataChunkWithPadding

196–198 ↗(On Diff #14617)

My motivation was to avoid even more complex logic. This code is already complicated and may be hard to read. We can send the data every time it goes up to the limit but it's going to make the machine state harder to understand and maintain. I understand your point, but please note, that it wasn't "pointless" - I just didn't want to sacrifice readability for an edge case that will probably never happen.

But I understand what you're saying and I basically agree. I still think the entire idea here is pretty bad, but I will update this logic anyway.

start method name with lowercase

update the logic so the internal buffer is being sent if it exceeds the limit instead of throwing there

This revision is now accepted and ready to land.Jul 25 2022, 2:25 AM
services/backup/src/Reactors/server/PullBackupReactor.cpp
65 ↗(On Diff #14811)

In which diff this code is introduced? I can't see it in any of this diff's dependencies.

We can make this condition less restrictive

services/backup/src/Reactors/server/PullBackupReactor.cpp
65 ↗(On Diff #14811)

In which diff this code is introduced? I can't see it in any of this diff's dependencies.

D4555

We can make this condition less restrictive

Right, I will update this code in the diff I mentioned, thanks!