Page MenuHomePhabricator

Implement attachments capture
ClosedPublic

Authored by marcin on Jan 26 2024, 7:15 AM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Nov 6, 4:52 PM
Unknown Object (File)
Oct 15 2024, 12:01 PM
Unknown Object (File)
Oct 15 2024, 12:01 PM
Unknown Object (File)
Oct 15 2024, 12:01 PM
Unknown Object (File)
Oct 15 2024, 12:00 PM
Unknown Object (File)
Oct 15 2024, 12:00 PM
Unknown Object (File)
Oct 15 2024, 12:00 PM
Unknown Object (File)
Oct 15 2024, 12:00 PM

Details

Summary

This differential adds method that parses logs in search for blob hashes and stores all found hashes to an attachments file.

Test Plan

The same as for parent differential but this time ensure that attachments file are created especially if you send multimedia messages. Attachments file is
a list of blob hashes separated by newline.

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

native/cpp/CommonCpp/DatabaseManagers/SQLiteConnectionManager.cpp
102–145 ↗(On Diff #36168)

So basically we iterate over each change in a log and for UPDATE and INSERT changes affecting media table we retrieve uri value and if it contains blob hash we return it to be persisted in a file. Checking for blob service hash is exactly the same as in main compaction creation.

Harbormaster returned this revision to the author for changes because remote builds failed.Jan 26 2024, 7:35 AM
Harbormaster failed remote builds in B26186: Diff 36168!
Harbormaster returned this revision to the author for changes because remote builds failed.Jan 26 2024, 8:13 AM
Harbormaster failed remote builds in B26190: Diff 36172!

It seems like our approach here is specific to one single place we store blobs: the uri column of the media table.

  1. Are we considering old-style uploads here, which are not stored in the blob service?
  2. Do we use the blob service anywhere else in the codebase today? Do we need to add holders for those cases too?
  3. Are we planning any other uses of the blob service that are worth considering? I wonder if we should design a more generic mechanism here, eg. have a standard way of storing blob IDs across SQLite, or perhaps have a single table in SQLite containing all of them.

It seems like our approach here is specific to one single place we store blobs: the uri column of the media table.

  1. Are we considering old-style uploads here, which are not stored in the blob service?
  2. Do we use the blob service anywhere else in the codebase today? Do we need to add holders for those cases too?
  3. Are we planning any other uses of the blob service that are worth considering? I wonder if we should design a more generic mechanism here, eg. have a standard way of storing blob IDs across SQLite, or perhaps have a single table in SQLite containing all of them.
  1. Old-style uploads do not need to be handled in attachments. Attachments are specific to blob-service uploads so that the backup service can read through the list of attachments and create a holder for each blob so that it is not deleted during clean-up before someone fetches them after restore.
  2. I talked to @bartek and confirmed that uri column of media table is the only place that we need to look for attachments now.
  3. There is a task for the correct solution: https://linear.app/comm/issue/ENG-6261/introduce-blob-hash-table

Thanks for explaining!

Old-style uploads do not need to be handled in attachments. Attachments are specific to blob-service uploads so that the backup service can read through the list of attachments and create a holder for each blob so that it is not deleted during clean-up before someone fetches them after restore.

I'm aware of that. I'm asking if the uri field might contain something for old-style uploads, that we want to skip. Is that being handled here correctly?

I'm asking if the uri field might contain something for old-style uploads, that we want to skip.

Your concern is correct - see translateMediaToClientDBMediaInfos() - the uri column is used for both kinds of media. They're distinguished by having encryption_key field in the extras column's tringified JSON. But this is about encrypted vs non-encrypted media.

Is that being handled here correctly?

Yes, because the Blob vs non-blob URIs are distinguished by the comm-blob-service:// URI prefix, which is done in L149 in this diff

Thanks for explaining!

Old-style uploads do not need to be handled in attachments. Attachments are specific to blob-service uploads so that the backup service can read through the list of attachments and create a holder for each blob so that it is not deleted during clean-up before someone fetches them after restore.

I'm aware of that. I'm asking if the uri field might contain something for old-style uploads, that we want to skip. Is that being handled here correctly?

Yes - I am not adding uri to the attachments list if it doesn't start with "comm-blob-service://" prefix.

Makes sense – thanks everyone!

kamil added inline comments.
native/cpp/CommonCpp/DatabaseManagers/NativeConnectionManager.cpp
66 ↗(On Diff #36484)

seems more idiomatic

89–90 ↗(On Diff #36484)

I think something strange happened here with the formatting

102–106 ↗(On Diff #36484)

I've seen so many of those in code already, wondering if adding function like this:

void NativeConnectionManager::handleSQLError(int errorCode, const std::string& errorPrefix) {
  if (errorCode != SQLITE_OK) {
    throw std::runtime_error(
        errorPrefix + std::string(sqlite3_errstr(errorCode)));
  }
}

isn't beneficial

117 ↗(On Diff #36484)
149 ↗(On Diff #36484)

how about using std::string::compare? avoids creating a sub-string and feels more efficient

This revision is now accepted and ready to land.Feb 5 2024, 3:04 PM
This revision was automatically updated to reflect the committed changes.