Details
Called these locally and ensured that entities are properly inserted/queried/removed from database.
Diff Detail
- Repository
- rCOMM Comm
- Lint
No Lint Coverage - Unit
No Test Coverage
Event Timeline
Did you intentionally leave this diff in Draft state? Just want to make sure it isn't stuck here for some other (possibly CI-related?) reason
Currently we have two tables:
- blob, represented by the BlobItem entity - it stores S3 paths identified by blob hashes - so one entity here corresponds to one S3 object. Each blob hash is unique here.
- reverse_index - represented by the ReverseIndexItem entity - it stores the holder - blob hash relationship. Holders are unique for each row, but each hash can have multiple holders.
For example, we have a blob in S3 some_bucket/my_blob1, which has two holders: holderA and holderB.
The blob table looks like this:
blob_hash | s3_path ____________________ my_blob1 | some_bucket/my_blob1
And the reverse_index table:
holder | blob_hash ________________________ holderA | my_blob1 holderB | my_blob1
So when holder A wants to get a blob, at first the blob hash is retrieved from reverse_index, then actual S3 path is retrieved from the blob table.
That's how current C++ implementation works. However, the Blob proto API is still subject to discussions and changes.
To be clear, the fact that the Get API takes a holder instead of a hash is something we agree should change. I think it's tracked in ENG-430?
Thanks for the explanation. This approach makes sense, but I'm wondering if we can consolidate the two tables in the future and just have a list of holders for each blob_hash in the blob table... Curious if you've thought about this.
That would require changing both Get and Remove APIs to take a blobHash. Get can take just the blobHash, but Remove would need to take both blobHash and holder. Not a bad idea, but we should follow-up and discuss in ENG-430 – changing the API is out-of-scope here