Page MenuHomePhabricator

[services-lib] Add BatchWriteItem utility function
ClosedPublic

Authored by bartek on Aug 22 2023, 1:44 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, May 2, 8:54 AM
Unknown Object (File)
Wed, May 1, 1:59 PM
Unknown Object (File)
Tue, Apr 30, 12:22 AM
Unknown Object (File)
Sun, Apr 28, 12:17 AM
Unknown Object (File)
Thu, Apr 25, 7:04 PM
Unknown Object (File)
Wed, Apr 17, 1:59 AM
Unknown Object (File)
Tue, Apr 16, 8:16 AM
Unknown Object (File)
Mon, Apr 15, 1:53 AM
Subscribers

Details

Summary

From parent diff description:

We use DDB batch operations in several places: reserved usernames write, saving new one-time keys, blob cleanup (soon), reports storage.

DDB BatchWriteItem operation API can be cumbersome - 25 items limit, unprocessed items, throttling etc. Their docs also recommend exponential backoff strategy. I decided to create an API that handles all this overhead for us. See also comments in D8452.

This diff uses some helpers from parent diff (D8894).

Depends on D8894

Test Plan
  1. Preparation

    On staging, created two tables: one with on-demand capacity, second one with provisioned capacity, with 1 write capacity unit. Also created one on localstack. The tables were simple one column tables.
  1. Tests

    Did the following:
    • Generated ~2k (not a multiple of 25) UUIDs (random strings). First test was to save them all in one batch.
    • Repeated, but split into chunks of ~400 items (413 IIRC). Spawned a spearate thread for each chunk and ran them concurrently.
    • Also, during report service development, imported over 3K reports (100MB) using this function. Each 25-item batch was sent in a separate thread. This was done on localstack.
  1. Observations
    • Localstack doesn't do any throttling so all items were written.
    • Similiar for on-demand capacity table. In fact it is able to throttle, but the limit is way too high for my tests.
    • With provisioned capacity table, I was able to observe several ProvisionedThroughputExceeded errors. Exponential backoff helped with them and after a few retries all rows were able to be written. However, for large concurrent batch, I had to increase the base delay to give DDB time to get up.
    • Never was able to "partially" succeed - i.e. the response was successful, but unprocessed_items was not empty.

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

bartek held this revision as a draft.
bartek published this revision for review.Aug 22 2023, 1:50 AM
services/comm-services-lib/src/database.rs
388–389 ↗(On Diff #30179)

not a blocker, but we could probably make this cleaner, by putting the is_enabled logic inside of the Config or Helper class.

impl ExponentialBackoffConfig {
  pub fn backoff_enabled(&self) {
     return self.max_attempts > 0;
  }
}
389 ↗(On Diff #30179)

Seems a bit cleaner, use Config as a factory for creating the counter state machines.

409–410 ↗(On Diff #30179)

Similar to the other suggestion, this could just be a method on Config

Could we generalise this so it can also DeleteRequest?

Could we generalise this so it can also DeleteRequest?

Yes, this function can receive Vec<WriteRequest> instead of Vec<AttributeMap>. I'll do this

services/comm-services-lib/src/database.rs
389 ↗(On Diff #30179)

Sorry, this would be let mut exponential_backoff = config.create_counter();

services/comm-services-lib/src/database.rs
389 ↗(On Diff #30179)

yeah, it's cleaner, Thanks! 😁

i left a suggestion and nit inline

services/comm-services-lib/src/database.rs
375 ↗(On Diff #30355)

splitted -> split

411–415 ↗(On Diff #30355)

would prefer if we could do something like

let result =exponential_backoff
    .execute_with_backoff(|| ddb.batch_write_item().request_items(table_name, chunk).send())
    .await?;

execute_with_backoff would then be able to handle calling sleep_and_retry, reset, etc.

This revision is now accepted and ready to land.Aug 28 2023, 10:48 AM
services/comm-services-lib/src/database.rs
411–415 ↗(On Diff #30355)

Interesting idea. I'll be doing a refactor and give it a try when implementing BatchGetItem support