Page MenuHomePhabricator

[keyserver] Fix olm_session_creation_failure bug due to fetchOlmUpdateAccount retry
ClosedPublic

Authored by ashoat on Mar 25 2024, 8:37 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Jan 3, 12:59 PM
Unknown Object (File)
Tue, Dec 24, 7:18 AM
Unknown Object (File)
Tue, Dec 24, 7:18 AM
Unknown Object (File)
Tue, Dec 24, 7:17 AM
Unknown Object (File)
Tue, Dec 24, 7:15 AM
Unknown Object (File)
Nov 16 2024, 5:12 PM
Unknown Object (File)
Nov 16 2024, 5:12 PM
Unknown Object (File)
Nov 16 2024, 5:10 PM
Subscribers
None

Details

Summary

I move the call to the identity service outside of the fetchOlmUpdateAccount block, to avoid a scenario where the identity service is uploaded keys that don't get persisted in the keyserver MariaDB.

See here for a description of the issue this resolves (scenario 4).

While working on this diff, I considered a potential risk that the client might end up with keys that don't get uploaded to identity. This situation where the client has more one-time keys than identity can be dangerous if the client ends up with more than 100 one-time keys, and starts pruning them. At that point, the keys pruned in the client might still be on identity.

However, this scenario seems relatively unlikely with the current implementation of one-time key refresh in the keyserver, where we debounce the requests, and they are only initiated when the number of one-time keys drops below 5.

Test Plan

Prior to this diff, I would see olm_session_creation_failure at least once every 10 registration attempts, but usually more frequent. After this diff, I was able to test registering 30 times in a row with no errors

Diff Detail

Repository
rCOMM Comm
Branch
ashoat/olm_session_creation_failure
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

keyserver/src/utils/olm-utils.js
115–116

The local variables contentOneTimeKeys and notifOneTimeKeys are assigned the most recent new one-time keys that were attempted to be saved by fetchOlmUpdateAccount

118–133

There's no reason to compose the two fetchOlmUpdateAccount calls anymore, so I decompose them

Given that both scenario and scenario 1 have to do with fetchUpdateOlmAccount, I wonder if we should consider what I suggested in my comment about scenario 1:

Perhaps the best solution to this would be to rearchitect fetchUpdateOlmAccount to use locking, so that it could support multiple simultaneous calls.

Do I understand correctly that this diff makes keyserver one time keys upload match approach that is used on native - eagerly marking one time keys as published before they get uploaded to the identity?

This revision is now accepted and ready to land.Mar 26 2024, 2:26 AM

Yes, but only for uploadNewOneTimeKeys. I left a comment on ENG-6691 mentioning this