Page MenuHomePhabricator

[services] Backup - Blob Put Client - Add essential objects
ClosedPublic

Authored by karol on Sep 1 2022, 4:06 AM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Dec 9, 3:40 AM
Unknown Object (File)
Tue, Dec 3, 10:53 PM
Unknown Object (File)
Tue, Dec 3, 10:53 PM
Unknown Object (File)
Tue, Dec 3, 10:27 PM
Unknown Object (File)
Fri, Nov 29, 12:30 AM
Unknown Object (File)
Nov 25 2024, 7:04 AM
Unknown Object (File)
Nov 22 2024, 6:21 AM
Unknown Object (File)
Nov 22 2024, 6:01 AM

Details

Summary

Depends on D5004

Adding necessary imports and objects that we're going to use in the put client.

Test Plan

backup builds

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

services/backup/blob_client/src/put_client.rs
28 ↗(On Diff #16165)

Still not convinced that Arc needs to be used with lazy_static

https://linear.app/comm/issue/ENG-1716#comment-5a5f8aef

31 ↗(On Diff #16165)

What are error messages used for? and do we need to really manage them?

Should they just be emitted and moved on?

services/backup/blob_client/src/put_client.rs
28 ↗(On Diff #16165)
31 ↗(On Diff #16165)

I don't think so. The purpose is to detect any errors in the async functions and break the flow yielding an error to the c++ if they occur. This was the simplest/fastest way of doing this but I'm open to suggestions. We can change it but I don't think we should just print them and move on. panic is not a way to go either.

jon requested changes to this revision.Sep 2 2022, 10:40 AM
jon added inline comments.
services/backup/blob_client/src/put_client.rs
31 ↗(On Diff #16165)

The purpose is to detect any errors in the async functions and break the flow yielding an error to the c++ if they occur.

If we enter an invalid state, then we should add some logic to get back to a valid state if possible.
The logic right now in the other diffs is that some errors will be emitted, but the functions just starts returning Ok(())'s which means everything is silently failing from the perspective of C++.

We can change it but I don't think we should just print them and move on

If all we can do is just report the status, then we should print it and return that it failed. I don't see the value in indefinitely accumulating error messages.

panic is not a way to go either.

Agreed, which is also why I don't like unwrap()

This revision now requires changes to proceed.Sep 2 2022, 10:40 AM
services/backup/blob_client/src/put_client.rs
31 ↗(On Diff #16165)

If we enter an invalid state, then we should add some logic to get back to a valid state if possible.

What do you mean exactly? I don't think this is a good idea. I think, if we enter an invalid state, we should break the connection and return a grpc status with an error.

The logic right now in the other diffs is that some errors will be emitted, but the functions just starts returning Ok(())'s which means everything is silently failing from the perspective of C++.

If that's what's happening, then it is a mistake, I'm going to look at this code once again, if there's ever an error, we should exit with a proper Result(Err).

I generally agree that the error handling here needs to be repaired big time. I just did it quickly and dirty.

I created a task for this https://linear.app/comm/issue/ENG-1734/improve-error-handling-system-in-the-blob-client

What do you mean exactly?

If the client gets to be in an unhealthy state, then there should be some logic to get try and get the client back into a healthy state and retry the get/put. Currently, an unhealthy client will both report an error and drop the message.

I don't think this is a good idea. I think, if we enter an invalid state, we should break the connection and return a grpc status with an error.

Yes, but the message also gets dropped. There should be some re-try logic (which can be done in another ticket).

jon requested changes to this revision.Sep 6 2022, 10:13 AM
This revision now requires changes to proceed.Sep 6 2022, 10:13 AM

If the client gets to be in an unhealthy state, then there should be some logic to get try and get the client back into a healthy state and retry the get/put. Currently, an unhealthy client will both report an error and drop the message.

Sorry, but I disagree. I think when the client gets into an unhealthy state, it should drop the connection and return an error.

If we wanted to really attempt to retry, we should at least add some logic of when to do this and when not. If we get an error like provided holder already exists in the database, then what is the point of retrying? If we try again, it won't disappear magically I guess, will it? And adding logic for this will probably result in adding an additional layer of abstraction or something - we could have a type of error that has a flag shouldRetry, something like this. All in all, don't think we should do it here and I don't think this is necessary and important either.

The way I see it now is that we could have a button on the client side in the UI saying retry when there's an error returned.

If we get an error like provided holder already exists in the database, then what is the point of retrying? If we try again, it won't disappear magically I guess, will it?

Wouldn't that mean that a previous put was already successful? Seems like a state that shouldn't ever occur, but I agree it should be handled by dropping.

I think the concept of the client and message are being fused here. There's a difference between a message being handled, and the client being healthy.

I'm refering to the lifecycle of a BidiClient, not the lifecycle of a message passed through a BidiClient.

Either way, seems like we are just pushing this through, and creating tasks for others to address concerns.

This revision is now accepted and ready to land.Sep 7 2022, 10:29 AM