Page MenuHomePhabricator

[services] Blob - Add extra bytes needed to get request
AbandonedPublic

Authored by karol on Jul 5 2022, 12:45 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Sep 27, 3:58 PM
Unknown Object (File)
Wed, Sep 25, 3:19 AM
Unknown Object (File)
Wed, Sep 25, 3:19 AM
Unknown Object (File)
Wed, Sep 25, 3:18 AM
Unknown Object (File)
Wed, Sep 25, 3:18 AM
Unknown Object (File)
Fri, Sep 13, 7:20 PM
Unknown Object (File)
Thu, Sep 12, 3:23 AM
Unknown Object (File)
Sep 3 2024, 9:02 AM

Details

Reviewers
tomek
ashoat
Summary

Depends on D4438

"Extra bytes needed" is a field that stands for how many bytes we need for any other data than the data that exceeds the grpc data limit and has to be chunked.

Example:
chunk limit = 4MB
We want to send 6MB of data
We also want to send some metadata/additional info, etc, for example, the id of the user who is an owner of this data.
Let's assume the metadata takes 0,1MB. Without it, we would send chunks like [4MB, 2MB], but since we have some additional information, we have to subtract its size from the maximum amount of data we're able to send in every grpc message.
So, in our example, we'll send data in chunks [3.9MB, 2.1MB], because in every message we'll have to attach the "metadata" that takes 0.1MB.

Test Plan
cd services
yarn run-integration-tests blob

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

Harbormaster returned this revision to the author for changes because remote builds failed.Jul 5 2022, 12:45 AM
Harbormaster failed remote builds in B10248: Diff 14140!
Harbormaster returned this revision to the author for changes because remote builds failed.Jul 5 2022, 1:12 AM
Harbormaster failed remote builds in B10252: Diff 14144!
karol edited the summary of this revision. (Show Details)
karol edited the test plan for this revision. (Show Details)
karol added a reviewer: tomek.
karol added a reviewer: ashoat.

CI is broken

ashoat requested changes to this revision.Jul 5 2022, 11:53 AM

I'm concerned that we're simply trusting the client to truthfully state this info. @karol-bisztyga, can you come up with a way we can move forward here without relying on the client to communicate this info to the blob service?

Additionally, I'm a bit confused about the context here. Is the work here related to the discussion in ENG-1029 about keeping packets below the gRPC max size?

Is there any previous discussion you can link to provide context about why we're making this change? The context in the summary simply isn't enough for me to go off on.

This revision now requires changes to proceed.Jul 5 2022, 11:53 AM

I'm concerned that we're simply trusting the client to truthfully state this info. @karol-bisztyga, can you come up with a way we can move forward here without relying on the client to communicate this info to the blob service?

What's wrong with this? It's just the client saying how many fewer bytes than the grpc chunk limit they are able to receive. If they provide a number larger than the limit, we'll throw an error. If you're afraid the final chunk limit might be too small resulting in too many cycles, we can also add a "lower chunk limit" that would say that we may not have a chunk limit lower than 1 or 2 MB for instance. What are other concerns here?

Answering the latter, I don't see another way of doing this, unfortunately. We could do something like this: Get the full amount of data from the blob to the backup (4MB) and then if we see on the backup that we have so much extra data that we'll not be able to send the full chunk, we could store the rest of the data temporarily and append it to the next chunk. This solution is bad in my opinion because we may assume that with every chunk, they may grow eventually going beyond the chunk limit. Let's go with the example as this is probably confusing:

  • we have 100MB of data on the blob for some backup.
  • we have 4MB of chunk limit
  • on backup, for every message, we send 0.1 MB of some extra data/metadata
  • we do this until all the data (100MB) is sent
    • take the maximum amount of data from the blob to the backup - 4MB
    • on the backup, you have 4MB of data from blob + 0.1MB of metadata
    • it's impossible to send 4.1MB of data to the client because you still have a 4MB limit from grpc
    • you send 3.9MB of blob data + 0.1MB of metadata to the client, saving the remaining 0.1MB of the blob data locally
    • you pull another chunk of data from the blob, then you append this 0.1MB of locally saved data, and now you have 4.1MB of the blob data
    • the metadata still takes 0.1MB, we can again only send 3.9MB to the client, so now we're going to have 0.2MB of the blob data to save locally

You should be able to see where this leads already. Eventually, we're going to have 4MB of data saved locally, then of course we can just send this as a separate chunk. But just think how many pitfalls this approach has, how much more testing it's going to require and how much more logic it's gonna contain.

I think just passing a number of bytes that we need for metadata to the blob in a request is much simpler and way less harmful.

Maybe there's yet another way of doing all this but I'm not seeing it, sorry.

Additionally, I'm a bit confused about the context here. Is the work here related to the discussion in ENG-1029 about keeping packets below the gRPC max size?

No, it's more like you pointed out that the attachments may be sent along with the chunks in D4245, then I observed that we didn't take under consideration that there may be a lot of extra data sent together with the data chunks and we should definitely handle this.

Is there any previous discussion you can link to provide context about why we're making this change? The context in the summary simply isn't enough for me to go off on.

It's just based on D4245

native/cpp/CommonCpp/grpc/protos/blob.proto
31

should be optional

ashoat requested changes to this revision.Jul 6 2022, 9:42 AM

I'm unfortunately still confused here. I read through @karol-bisztyga's comment twice, but I still don't understand what this is solving and why it's necessary.

Maybe the easiest way forward would be for @palys-swm to take a first pass? I'm curious for his perspective, and in particular I'm curious if we can come up with a way we can move forward here without relying on the client to communicate this info to the blob service.

This revision now requires changes to proceed.Jul 6 2022, 9:42 AM

If you still want to re-request review on this one, can you please INSTEAD schedule a 1:1 to discuss? Here's a link. I would really appreciate it if you do not re-request review here before we have time to discuss 1:1.

I'm unfortunately still confused here. I read through @karol-bisztyga's comment twice, but I still don't understand what this is solving and why it's necessary.

This is disappointing to read. Not only because I tried to explain this in the cleanest way I could but also because you did not provide any information on what exactly is unclear. That makes it hard to explain because, in the end, we want you to understand this - otherwise, it will be blocked forever I guess. Next time, if possible, please specify what exactly is unclear and what I should explain in more detail.

Ok, I'm not sure what exactly you do not understand and how can I make it more clear. Let's go with this:

  1. we have 100MB of data on blob
  2. we do "pull backup" from the client
  3. the flow goes into the backup, and the backup asks the blob for the data
  4. there's 100MB of data to send blob => backup => client
  5. 100MB > 4MB, 4MB is a chunk limit. Chunk limit - a limit of data that we are able to send in one chunk using grpc
  6. we have to use chunks, this means we have to slice the data and send it in multiple responses
  7. now, let's assume, that we have to send some additional data, so in response backup => client we have to send, for example, backupID
  8. let's assume the backup id takes 0.1MB - it will not take as much space but there might be some other "extra data" and we have to assume that it may get large (separately, 0.1MB is just so this explanation looks cleaner, but we should be precise and count every byte. I just didn't want to do math operations here like 4MB-20bytes because that would be unclear. 0.1MB makes it clean)
  9. we pull the chunk blob => backup, we pull 4MB of data because we want to use the maximum of the grpc's capacity.
  10. this is because we want to use as few cycles as possible
  11. again, we pull 4MB of data from blob to backup
  12. we receive 4MB of data in the backup service
  13. we have to pass this 4MB of data (received from the blob) from the backup to the client
  14. now, look at point 7 - we have 0.1MB of additional/extra data to send (backup id in this case)
  15. extra data takes 0.1MB, data chunk from the blob takes 4MB
  16. 4MB + 0.1MB = 4.1MB
  17. 4.1MB > 4.0MB, 4.0MB is a grpc chunk limit
  18. we are unable to send all the data with a single chunk from the backup to the client and this is a problem
  19. my solution is to let us tell the blob how smaller every chunk should be so we can also add additional data without exceeding the 4MB limit when passing the data to the client
  20. with my solution, we tell blob "hey, get me 0.1MB smaller chunks"
  21. now the blob sets the chunk limit to 4.0MB - 0.1MB = 3.9MB
  22. in the situation from point 11/12, we receive 3.9MB (instead of 4.0MB) on the backup service
  23. we add 0.1MB of our extra data
  24. we send to the user 3.9MB (data chunk from the blob) + 0.1MB (extra data, backup id in this case)
  25. 0.1MB + 3.9MB = 4MB which doesn't exceed the 4MB grpc chunk limit. Therefore we are able to send the data from the backup to the client without problems

Please tell me which point exactly you do understand and which of these you do not understand:

  1. do you understand this point? If not, please specify what is unclear.
  2. do you understand this point? If not, please specify what is unclear.
  3. do you understand this point? If not, please specify what is unclear.
  4. ...
  5. ...

@ashoat

tomek requested changes to this revision.Jul 7 2022, 6:43 AM

It looks like this diff is trying to solve a real issue, which we definitely need to address, but we should also think about the consequences.

Overall, we should make our API as convenient as possible by placing the complexity on server side and making clients less complicated. This is beneficial because we can assume that any service will be used by many clients and writing complicated logic in more places is error-prone. So if there's a way for the API to be easier to consume, we should do that.

In this case the underlying issue is that services talk to each other, each of them can add some (meta)data and if one service uses the whole message capacity, subsequent service cannot add to it. The proposed solution should solve the issue, but it complicates API usage. Also, if a sequence of services is longer, each of them should add something to extra_bytes_needed which can also be fragile.

Regarding the alternatives, @karol-bisztyga proposed a solution:
We can send however bytes we want and keep the bytes waiting for the next payload to which they can be added. The remaining bytes may build up, so we can either send them when they grow up to chunk limit, or send them at the end.
This solution addresses the issue, as clients do not need to share a detail about how many extra bytes they need to add.

We can also consider some other options. Overall, the goal here is to have as few chunks as possible, because we think that this will improve the performance. But are we sure that this is the case? Have we measured how long it takes to process a chunk compared to time needed to send the payload? If we want to send encrypted data, how long it would take to decrypt it comparing to sending it from services?
It feels like optimizing the number of requests might be premature and we should instead focus on delivering something that works well enough. A possible solution might be:
If we receive some data and realize that after adding metadata the chunk is too big, we can split it in half and send both payloads. If we are afraid that this will produce too many chunks, we can modify blob service to send up to e.g. 95% of limit - the case where backup will add more than 5% of available chunk should be rare, but in that case we would split it in half.

Your explanation is really long... I appreciate that you are trying to be thorough, but it actually has the opposite effect from your goal... it makes it even more expensive for me to understand what is going on here.

The goal should be to explain simply and concisely what the problem is. This might require repeated revision.

After reading, I have a hypothesis as to what you are trying to say. Let me try to state it as I would've hoped you would've stated it in the diff description, and you can tell me if I'm misintepreting:

The backup service uses the blob service for storage. When a client needs to recover a backup, the backup service needs to return that backup in chunks to the client because gRPC has a limit to the size of data it can send in a single message (4 MiB).

Since the data is ultimately stored in the blob service, those individual chunks need to be fetched from there. But the backup service needs to forward those chunks to the client.

The ultimate issue is this: the backup service will be adding some additional data to the chunk before forwarding it down to the client, but the blob service doesn't have full awareness of the size of this additional data. As a result, there is no way for the blob service to know the exact right size of the chunk it should send in order for the ultimate chunk sent down by the backup service to be precisely 4 MiB.

There are several solutions we could pursue here:

  1. We could simply return much smaller chunks from the blob service, eg. 2 MiB. Then we could be fairly confident that the backup service's additional metadata would not lead to a chunk that is larger than 4 MiB. (We may need to consider the implications of patterns like repeated here.)
  2. The backup service could maintain its own buffer. It would fetch 4 MiB chunks from the blob service, and then cut out extra data so that the chunk it returns to the client does not exceed the chunk limit. Then when the next chunk is received, the backup service would again slice it, and this time concatenate the end portion from the previous chunk to the front.
  3. The solution I propose here: allow the backup service to directly communicate the size of the chunk it needs from the blob service. In this case I have introduced a field that allows the backup service to communicate the size of its additional metadata to the blob service, and the blob service will simply subtract that from the chunk size it returns. But we could alternately simply have the backup service specify the max chunk size it can accept.

I'm curious for perspectives on the above options from @karol-bisztyga and @palys-swm. I think 2 is my favorite... I don't think it should be that hard to implement, and it will make the API design simpler. Next favorite is 1, as long as we don't face any issues with eg. repeated, or even a hypothetical future repeated. As for 3, I honestly am not a fan of this new field. If we did decide to keep it I would rather have it be maxChunkSize or something.

3rd option isn't ideal because a service would need to know how exactly gRPC works, how many bytes it would add... and that could change with any new release of the library.
2nd option is probably the best, as it will work in any scenario, but is a lot more complicated than the 1st option... if we realize the 2nd option would take us really long for some reason, we can go with 1st or consider an alternative I proposed in one comment (splitting a message in two it it exceeds the limit), but we should probably start with trying the 2nd.
We could also do some performance tests and if we realize that having two times more messages doesn't affect the performance significantly, we can choose to go with the 1st.

Ok, I still think that the 3rd option is the best but since I'm outvoted, I'm going to go with option number 2. Option 1 seems like a quick fix and I don't think that in the end, it delivers an efficient solution.

I've been thinking about option 2 for some time and it's going to require a lot more work and testing (to glue the data so it stays consistent, there may be a lot more edge cases, etc). Therefore the code's going to be more complicated and error-prone. Also, we should pay attention to properly allocating buffers so we use the least memory possible. I think there are just so many pitfalls that it doesn't make it a good approach but ok, I'm going to try my best to implement it as simple and robust as possible.

This diff and the following diffs in this stack are probably going to be abandoned once I put up a new solution.