Page MenuHomePhabricator

[services][blob] Add reverse index database methods
ClosedPublic

Authored by bartek on Nov 21 2022, 12:56 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Apr 3, 4:28 AM
Unknown Object (File)
Wed, Apr 3, 4:28 AM
Unknown Object (File)
Wed, Apr 3, 4:28 AM
Unknown Object (File)
Wed, Apr 3, 4:28 AM
Unknown Object (File)
Wed, Apr 3, 4:27 AM
Unknown Object (File)
Wed, Apr 3, 4:18 AM
Unknown Object (File)
Mar 26 2024, 12:54 AM
Unknown Object (File)
Mar 11 2024, 8:25 PM
Subscribers

Details

Summary

Implemented methods for managing the blob-service-reverse-index table. They will be needed in subsequent diffs. This is mostly the same approach as in D5693 (parent diff).

Link to the C++ counterpart

Part of ENG-2300

Depends on D5693

Test Plan

Called these locally and ensured that entities are properly inserted/queried/removed from database.

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

bartek held this revision as a draft.

Did you intentionally leave this diff in Draft state? Just want to make sure it isn't stuck here for some other (possibly CI-related?) reason

bartek published this revision for review.Nov 22 2022, 1:54 AM
varun requested changes to this revision.Nov 22 2022, 10:17 AM

Can you explain what the reverse index table is for?

This revision now requires changes to proceed.Nov 22 2022, 10:17 AM

Currently we have two tables:

  • blob, represented by the BlobItem entity - it stores S3 paths identified by blob hashes - so one entity here corresponds to one S3 object. Each blob hash is unique here.
  • reverse_index - represented by the ReverseIndexItem entity - it stores the holder - blob hash relationship. Holders are unique for each row, but each hash can have multiple holders.

For example, we have a blob in S3 some_bucket/my_blob1, which has two holders: holderA and holderB.
The blob table looks like this:

blob_hash | s3_path
____________________
my_blob1  | some_bucket/my_blob1

And the reverse_index table:

holder   | blob_hash
________________________
holderA  | my_blob1
holderB  | my_blob1

So when holder A wants to get a blob, at first the blob hash is retrieved from reverse_index, then actual S3 path is retrieved from the blob table.

That's how current C++ implementation works. However, the Blob proto API is still subject to discussions and changes.

To be clear, the fact that the Get API takes a holder instead of a hash is something we agree should change. I think it's tracked in ENG-430?

Thanks for the explanation. This approach makes sense, but I'm wondering if we can consolidate the two tables in the future and just have a list of holders for each blob_hash in the blob table... Curious if you've thought about this.

This revision is now accepted and ready to land.Nov 23 2022, 12:17 AM

That would require changing both Get and Remove APIs to take a blobHash. Get can take just the blobHash, but Remove would need to take both blobHash and holder. Not a bad idea, but we should follow-up and discuss in ENG-430 – changing the API is out-of-scope here