[services][blob] Add reverse index database methods
ClosedPublic
Actions

Authored by bartek on Nov 21 2022, 8:56 AM.

Details

Reviewers

varun
• jon
tomek
marcin

Commits

rCOMM0fc78b1298e1: [services][blob] Add reverse index database methods

Summary

Implemented methods for managing the blob-service-reverse-index table. They will be needed in subsequent diffs. This is mostly the same approach as in D5693 (parent diff).

Link to the C++ counterpart

Part of ENG-2300

Depends on D5693

Test Plan

Called these locally and ensured that entities are properly inserted/queried/removed from database.

Diff Detail

Repository

rCOMM Comm

Branch

barthap/blob-rust

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

bartek created this revision.Nov 21 2022, 8:56 AM

bartek held this revision as a draft.

Herald added subscribers: atul, ashoat. · View Herald TranscriptNov 21 2022, 8:56 AM

bartek added 1 blocking reviewer(s): varun.Nov 21 2022, 8:58 AM

Harbormaster completed remote builds in B13599: Diff 18628.Nov 21 2022, 9:10 AM

bartek edited the summary of this revision. (Show Details)Nov 21 2022, 11:57 AM

Did you intentionally leave this diff in Draft state? Just want to make sure it isn't stuck here for some other (possibly CI-related?) reason

bartek added a child revision: D5700: [services][blob] Add helpers and constants for Get RPC.Nov 22 2022, 8:53 AM

bartek published this revision for review.Nov 22 2022, 9:54 AM

Can you explain what the reverse index table is for?

This revision now requires changes to proceed.Nov 22 2022, 6:17 PM

Currently we have two tables:

blob, represented by the BlobItem entity - it stores S3 paths identified by blob hashes - so one entity here corresponds to one S3 object. Each blob hash is unique here.
reverse_index - represented by the ReverseIndexItem entity - it stores the holder - blob hash relationship. Holders are unique for each row, but each hash can have multiple holders.

For example, we have a blob in S3 some_bucket/my_blob1, which has two holders: holderA and holderB.
The blob table looks like this:

blob_hash | s3_path
____________________
my_blob1  | some_bucket/my_blob1

And the reverse_index table:

holder   | blob_hash
________________________
holderA  | my_blob1
holderB  | my_blob1

So when holder A wants to get a blob, at first the blob hash is retrieved from reverse_index, then actual S3 path is retrieved from the blob table.

That's how current C++ implementation works. However, the Blob proto API is still subject to discussions and changes.

bartek requested review of this revision.Nov 22 2022, 6:32 PM

To be clear, the fact that the Get API takes a holder instead of a hash is something we agree should change. I think it's tracked in ENG-430?

Thanks for the explanation. This approach makes sense, but I'm wondering if we can consolidate the two tables in the future and just have a list of holders for each blob_hash in the blob table... Curious if you've thought about this.

This revision is now accepted and ready to land.Nov 23 2022, 8:17 AM

That would require changing both Get and Remove APIs to take a blobHash. Get can take just the blobHash, but Remove would need to take both blobHash and holder. Not a bad idea, but we should follow-up and discuss in ENG-430 – changing the API is out-of-scope here