Page MenuHomePhabricator

[services] Tunnelbroker - Add exponential backoff related constants
ClosedPublic

Authored by max on Jul 4 2022, 4:02 PM.
Tags
None
Referenced Files
F3673203: D4436.id.diff
Mon, Jan 6, 5:52 AM
F3673066: D4436.id14382.diff
Mon, Jan 6, 5:33 AM
F3673045: D4436.id14136.diff
Mon, Jan 6, 5:33 AM
F3673032: D4436.id14575.diff
Mon, Jan 6, 5:32 AM
Unknown Object (File)
Fri, Jan 3, 5:30 AM
Unknown Object (File)
Sat, Dec 28, 3:14 PM
Unknown Object (File)
Thu, Dec 26, 4:34 AM
Unknown Object (File)
Thu, Dec 26, 4:34 AM

Details

Summary

This diff introduces additional constants for the function for exponential backoff timeout which will be used in DynamoDB batch write operation and throttling. The function itself is introduced in D4373, and the usage of these constants is in D4378.

DYNAMODB_BACKOFF_FIRST_RETRY_DELAY - Is an exponential backoff first delay in milliseconds.
DYNAMODB_MAX_BACKOFF_TIME - Maximum delay timeout after which the operation would be considered as failed.

Test Plan

No testing is needed, it's just a constant.

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

max held this revision as a draft.
max published this revision for review.Jul 5 2022, 5:53 AM
max edited the summary of this revision. (Show Details)

Where did these values come from?

In D4436#126148, @karol-bisztyga wrote:

Where did these values come from?

Thanks for questioning this!
It's an optimized value from my perspective.
DYNAMODB_BACKOFF_STEP - the 5 ms with an exponential backoff should be a good value in case we will experience a throttling from the AWS.
DYNAMODB_MAX_BACKOFF_TIME - 10 sec as a maximum wait time until we consider the operation is failed seems to be enough too. From my perspective, the maximum wait time from the database should not be greater than 5-10 seconds. To avoid "flooding" the database by the requests in case of the database is down or has performance issues. But curious for any thoughts about this.

tomek requested changes to this revision.Jul 8 2022, 2:04 AM

I've found a document https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.RetryAndBackoff where it is stated that

In addition to simple retries, each AWS SDK implements an exponential backoff algorithm for better flow control.

So it seems the SDK should include this algorithm. Can you check if this is the case?

services/tunnelbroker/src/Constants.h
12 ↗(On Diff #14136)

In the document they present an example

For example, up to 50 milliseconds before the first retry

So maybe we should increase this number. 5ms sounds like a really small delay in network world.

Also, in exponential backoff we multiply the delay after each retry, so naming it a step may suggest that it is increased instead of multiplied. A better name could be e.g. DYNAMODB_BACKOFF_FIRST_RETRY_DELAY

13 ↗(On Diff #14136)

In the linked document, they suggest to

Set the maximum number of retries to stop around one minute.

But I guess setting it to something smaller is ok

This revision now requires changes to proceed.Jul 8 2022, 2:04 AM

DYNAMODB_BACKOFF_STEP changed to DYNAMODB_BACKOFF_FIRST_RETRY_DELAY. Value is increased from 5 ms to 50 ms for the first step.

max added inline comments.
services/tunnelbroker/src/Constants.h
12 ↗(On Diff #14136)

In the document they present an example

For example, up to 50 milliseconds before the first retry

So maybe we should increase this number. 5ms sounds like a really small delay in network world.

Also, in exponential backoff we multiply the delay after each retry, so naming it a step may suggest that it is increased instead of multiplied. A better name could be e.g. DYNAMODB_BACKOFF_FIRST_RETRY_DELAY

I've changed the constant name and increased the value to 50ms due to the algorithm changes in D4373

13 ↗(On Diff #14136)

In the linked document, they suggest to

Set the maximum number of retries to stop around one minute.

But I guess setting it to something smaller is ok

I think in our case minute is too much. In case of high usage, it will cause a flood of "waiting" requests if the database becomes unavailable for some time. If the usage scenario provides rare requests to the DB it will be ok to go with some big amount of time, but in our case small is better, because it's better to return a temporary error to the client instead of flooding the database with the requests. From my perspective is: higher potential usage - smaller timeout.

In D4436#127443, @palys-swm wrote:

I've found a document https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html#Programming.Errors.RetryAndBackoff where it is stated that

In addition to simple retries, each AWS SDK implements an exponential backoff algorithm for better flow control.

So it seems the SDK should include this algorithm. Can you check if this is the case?

I saw this page too and checked the C++ SDK code and seems that we should implement it by our-self in case of use in the batch writes scenario. Even the demo code for batchWrite from AWS C++ SDK suggests implementing it.

This revision is now accepted and ready to land.Jul 18 2022, 6:52 AM