Page MenuHomePhabricator

[services-lib] Add BatchWriteItem utility function
ClosedPublic

Authored by bartek on Aug 22 2023, 1:44 AM.
Tags
None
Referenced Files
F3647295: D8895.id.diff
Sun, Jan 5, 12:12 AM
Unknown Object (File)
Tue, Dec 31, 4:08 AM
Unknown Object (File)
Tue, Dec 31, 4:08 AM
Unknown Object (File)
Tue, Dec 31, 4:08 AM
Unknown Object (File)
Tue, Dec 31, 4:08 AM
Unknown Object (File)
Tue, Dec 31, 4:07 AM
Unknown Object (File)
Tue, Dec 31, 3:48 AM
Unknown Object (File)
Tue, Dec 31, 12:02 AM
Subscribers

Details

Summary

From parent diff description:

We use DDB batch operations in several places: reserved usernames write, saving new one-time keys, blob cleanup (soon), reports storage.

DDB BatchWriteItem operation API can be cumbersome - 25 items limit, unprocessed items, throttling etc. Their docs also recommend exponential backoff strategy. I decided to create an API that handles all this overhead for us. See also comments in D8452.

This diff uses some helpers from parent diff (D8894).

Depends on D8894

Test Plan
  1. Preparation

    On staging, created two tables: one with on-demand capacity, second one with provisioned capacity, with 1 write capacity unit. Also created one on localstack. The tables were simple one column tables.
  1. Tests

    Did the following:
    • Generated ~2k (not a multiple of 25) UUIDs (random strings). First test was to save them all in one batch.
    • Repeated, but split into chunks of ~400 items (413 IIRC). Spawned a spearate thread for each chunk and ran them concurrently.
    • Also, during report service development, imported over 3K reports (100MB) using this function. Each 25-item batch was sent in a separate thread. This was done on localstack.
  1. Observations
    • Localstack doesn't do any throttling so all items were written.
    • Similiar for on-demand capacity table. In fact it is able to throttle, but the limit is way too high for my tests.
    • With provisioned capacity table, I was able to observe several ProvisionedThroughputExceeded errors. Exponential backoff helped with them and after a few retries all rows were able to be written. However, for large concurrent batch, I had to increase the base delay to give DDB time to get up.
    • Never was able to "partially" succeed - i.e. the response was successful, but unprocessed_items was not empty.

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

bartek held this revision as a draft.
bartek published this revision for review.Aug 22 2023, 1:50 AM
services/comm-services-lib/src/database.rs
388–389

not a blocker, but we could probably make this cleaner, by putting the is_enabled logic inside of the Config or Helper class.

impl ExponentialBackoffConfig {
  pub fn backoff_enabled(&self) {
     return self.max_attempts > 0;
  }
}
389

Seems a bit cleaner, use Config as a factory for creating the counter state machines.

409–410

Similar to the other suggestion, this could just be a method on Config

Could we generalise this so it can also DeleteRequest?

Could we generalise this so it can also DeleteRequest?

Yes, this function can receive Vec<WriteRequest> instead of Vec<AttributeMap>. I'll do this

services/comm-services-lib/src/database.rs
389

Sorry, this would be let mut exponential_backoff = config.create_counter();

services/comm-services-lib/src/database.rs
389

yeah, it's cleaner, Thanks! 😁

i left a suggestion and nit inline

services/comm-services-lib/src/database.rs
375 ↗(On Diff #30355)

splitted -> split

411–415 ↗(On Diff #30355)

would prefer if we could do something like

let result =exponential_backoff
    .execute_with_backoff(|| ddb.batch_write_item().request_items(table_name, chunk).send())
    .await?;

execute_with_backoff would then be able to handle calling sleep_and_retry, reset, etc.

This revision is now accepted and ready to land.Aug 28 2023, 10:48 AM
services/comm-services-lib/src/database.rs
411–415 ↗(On Diff #30355)

Interesting idea. I'll be doing a refactor and give it a try when implementing BatchGetItem support