This addresses ENG-11464.
Details
I inserted a 5s sleep after the user keys file was created, but before the user data file was created. This consistently reproduced the issue prior to this change. Afterwards I confirmed that the issue no longer reproed, testing twice with the sleep and twice without.
Diff Detail
- Repository
- rCOMM Comm
- Lint
Lint Not Applicable - Unit
Tests Not Applicable
Event Timeline
Accepting because this fixes the serious issue, but this is not ideal. In the inline comment, I pointed out one problem, so you could either fix it now or land and create a follow-up, since this diff improves it a lot anyway
| native/native_rust_library/src/backup.rs | ||
|---|---|---|
| 126–147 ↗ | (On Diff #50691) | There is a chance that this code might cause another issue. The case is when trigger_backup_file_upload() is called by something else (generating a log), after create_main_compaction creates a file, but before finalize_user_keys_compaction is resolved. In that case, upload_files is going to match user_data only, upload_backup is going to call backups/user_data which is going to fail with BackupError::NoBackup. This is not that bad; the backup handler should be restarted, but it's definitely something we might want to improve. This is a missing property from what Codex suggested originally, and I pointed out as an alternative, to handle this for a specific backupID to make sure that it covers the existence of both files. |
Good call! I asked Codex to address the concern, and it added an early exit to compaction.upload_files when the user_data is present but the user_keys is missing
Going to land this as-is, but would appreciate a re-review from @kamil on Monday to make sure the changes address his concern