Page MenuHomePhabricator

[keyserver] Fix olm_session_creation_failure bug due to fetchOlmUpdateAccount retry
ClosedPublic

Authored by ashoat on Mar 25 2024, 8:37 PM.
Tags
None
Referenced Files
F3655909: D11384.id38323.diff
Sun, Jan 5, 12:08 PM
F3655845: D11384.id38324.diff
Sun, Jan 5, 12:05 PM
Unknown Object (File)
Fri, Jan 3, 12:59 PM
Unknown Object (File)
Tue, Dec 24, 7:18 AM
Unknown Object (File)
Tue, Dec 24, 7:18 AM
Unknown Object (File)
Tue, Dec 24, 7:17 AM
Unknown Object (File)
Tue, Dec 24, 7:15 AM
Unknown Object (File)
Nov 16 2024, 5:12 PM
Subscribers
None

Details

Summary

I move the call to the identity service outside of the fetchOlmUpdateAccount block, to avoid a scenario where the identity service is uploaded keys that don't get persisted in the keyserver MariaDB.

See here for a description of the issue this resolves (scenario 4).

While working on this diff, I considered a potential risk that the client might end up with keys that don't get uploaded to identity. This situation where the client has more one-time keys than identity can be dangerous if the client ends up with more than 100 one-time keys, and starts pruning them. At that point, the keys pruned in the client might still be on identity.

However, this scenario seems relatively unlikely with the current implementation of one-time key refresh in the keyserver, where we debounce the requests, and they are only initiated when the number of one-time keys drops below 5.

Test Plan

Prior to this diff, I would see olm_session_creation_failure at least once every 10 registration attempts, but usually more frequent. After this diff, I was able to test registering 30 times in a row with no errors

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

keyserver/src/utils/olm-utils.js
115–116 ↗(On Diff #38323)

The local variables contentOneTimeKeys and notifOneTimeKeys are assigned the most recent new one-time keys that were attempted to be saved by fetchOlmUpdateAccount

118–133 ↗(On Diff #38323)

There's no reason to compose the two fetchOlmUpdateAccount calls anymore, so I decompose them

Given that both scenario and scenario 1 have to do with fetchUpdateOlmAccount, I wonder if we should consider what I suggested in my comment about scenario 1:

Perhaps the best solution to this would be to rearchitect fetchUpdateOlmAccount to use locking, so that it could support multiple simultaneous calls.

Do I understand correctly that this diff makes keyserver one time keys upload match approach that is used on native - eagerly marking one time keys as published before they get uploaded to the identity?

This revision is now accepted and ready to land.Mar 26 2024, 2:26 AM

Yes, but only for uploadNewOneTimeKeys. I left a comment on ENG-6691 mentioning this