Page MenuHomePhabricator

[identity] avoid returning errors if one-time keys are missing
ClosedPublic

Authored by varun on Sep 10 2024, 5:29 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Nov 29, 8:38 AM
Unknown Object (File)
Fri, Nov 29, 7:21 AM
Unknown Object (File)
Mon, Nov 25, 6:40 PM
Unknown Object (File)
Mon, Nov 25, 4:43 PM
Unknown Object (File)
Wed, Nov 20, 11:32 AM
Unknown Object (File)
Wed, Nov 20, 11:32 AM
Unknown Object (File)
Wed, Nov 20, 11:32 AM
Unknown Object (File)
Thu, Nov 14, 1:02 PM
Subscribers

Details

Summary

in our olm fork, one-time keys are now optional. however, in the identity service, we still have a couple places where we treat them as required, and fail certain requests if we're unable to fetch them. we should update these parts of the code to complete RPCs successfully regardless of OTK retrieval success or failure.

Test Plan
  • Manually deleted OTKs from my test account on staging, then called GetKeyserverKeys and GetOutboundKeysForUser and got successful responses from both (without any OTKs in the response)
  • Modified get_one_time_key to return an error and confirmed that both RPCs were still successful

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

varun held this revision as a draft.
varun added reviewers: bartek, kamil, will.
varun published this revision for review.Sep 10 2024, 8:16 PM

I'm letting others review as well.
Generally feels alright and solves the original issue. Left one comment that's concerning to me

services/identity/src/database.rs
786–802 ↗(On Diff #44019)

I think we should not simply unwrap_or(None) here, at least without some considerations:
The self.get_one_time_key() returns sth like Result<Option<OTK>, Error>, so it should already return Ok(None) when an OTK cannot be retrieved. However, other errors can still happen and should be somehow propagated.

I see two options here:

  1. Either we back to await? here - this results in RPC failing when AWS/DynamoDB errors occur, but not when no OTKs are available
  2. Or we discard all errors like you did here, so the RPC will succeed with None, but we should either:
    • Introduce here a log displaying the error message. This is more future-proof but might result in duplicated error alerts in our dashboard.
    • Or make sure all possible failures inside self.get_one_time_key() (and inner calls of self.get_one_time_keys()) have their respective error logs. This requires no action now, but increases maintenance cost in the future - we need to remember that errors are discarded without notice at some layer above.

After a brief look, the conditions for option 2b are met, but I'd appreciate double-checking it.

The same applies for the get_keyserver_keys

services/identity/src/database.rs
786–802 ↗(On Diff #44019)

i think 2a makes sense. the duplication is a minor issue IMO

Hey team, let's prioritize reviewing this one – I'm continuing to see identity errors in production related to missing OTKs, and I'm worried they're causing "unknown error" issues

It would be great if we could try and launch this this fix to production by the end of the week, or at least by Monday

This revision is now accepted and ready to land.Sep 12 2024, 3:04 AM

@varun FYI @bartek messages me that it looked good for him, just was waiting for me to review it as well but I just now have a chance to look at it.

@varun FYI @bartek messages me that it looked good for him, just was waiting for me to review it as well but I just now have a chance to look at it.

got it, thanks!