Page MenuHomePhabricator

[comm-lib] Function for calculating dynamodb item size
ClosedPublic

Authored by michal on Dec 19 2023, 6:23 AM.
Tags
None
Referenced Files
F2205339: D10405.diff
Sat, Jul 6, 8:35 PM
Unknown Object (File)
Fri, Jul 5, 7:42 PM
Unknown Object (File)
Thu, Jul 4, 6:14 PM
Unknown Object (File)
Thu, Jul 4, 10:59 AM
Unknown Object (File)
Tue, Jul 2, 6:43 AM
Unknown Object (File)
Sat, Jun 29, 9:39 AM
Unknown Object (File)
Tue, Jun 25, 5:12 AM
Unknown Object (File)
Tue, Jun 25, 3:24 AM
Subscribers

Details

Summary

Now that we will have two places where we need to calculate the size of an item in dynamodb, it would be good to introduce an abstraction that will do it for us. This diff introduces a function that will take a AttributeMap and return size of the item. The specific values were taken from aws docs, and this blogpost.

Differences from the current method (in terms of the final size):

  • this method doesn't count length of optional attribute names if they don't exist in the final item
  • lest error-prone -> e.g. the current implementation skipped value of creation_time field during size calculations

A problem with this approach is unnecessary allocations when creating the AttributeMap that will be then discarded. There's a chance that Rust will optimize it away, but I haven't tested it. In any case the "correct" solution would be something with either derive macros or custom serializers with serde. But the code in this diff is very localized so it will be easy to replace if it turned out to be a performance problem in the future.

Depends on D10402

Test Plan

cargo check in report service.
I run the current calculations and the new function, and for this item:

ReportItem {
  id: ReportID::default(),
  user_id: "user_id".to_string(),
  report_type: ReportType::ErrorReport,
  platform: ReportPlatform::MacOS,
  creation_time: Utc::now(),
  content: BlobOrDBContent::Blob(BlobInfo::from_bytes(b"data")),
  encryption_key: None,
};

The current soluton returned 230 and the new one returned 259. The slight difference can be attributed to counting the length of creation_time field, and skipping the encryption_key attribute name

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

bartek added inline comments.
services/reports/src/database/item.rs
68 ↗(On Diff #34843)

Is this the place you mentioned unnecessary allocations?

shared/comm-lib/src/database.rs
758 ↗(On Diff #34843)

Which variants are missing?

This revision is now accepted and ready to land.Dec 20 2023, 12:17 AM
services/reports/src/database/item.rs
68 ↗(On Diff #34843)

Yes. This could potentially be optimized if we used the resulting AttributeMap for both size measuring and for call to dynamodb sdk, but it would make the function signatures not as nice.

shared/comm-lib/src/database.rs
758 ↗(On Diff #34843)

All variants in the current SDK are handled, but there is a AttributeValue::Unknown variant:

/// The `Unknown` variant represents cases where the server sent a value that wasn't recognized
/// by the client. This can happen when the server adds new functionality, but the client has not been updated.

Additionally the enum is marked as #[non_exhaustive] so we will need a catch-all _ => anyway.

Alternatively we could panic here, because the chance of that happening are very small (someone would have to call this function with a AttributeMap fetched from aws _with_ some new types) but it didn't feel right so I added a Result.