[identity] Add exponential backoff for one-time key retries
Summary:
Patrially addresses ENG-9264.
Used exponential backoff utility from our batch helper in our OTK logic, without introducing dedicated transact write helper, which would be much more involved.
Test Plan:
Started local Identity. Registered a user device and uploaded a bunch of OTKs.
Added the following test to Commtest = it spawns 9 simultaneous GetOutboundKeys RPC calls:
#[tokio::test] async fn otks() { // the device you uploaded OTKs for let user_id = "** paste here **"; let device_id = "** paste here **"; let access_token = "** paste here **"; let request = OutboundKeysForUserRequest { user_id: user_id.to_string(), }; // spawn 9 simultaneous GetOutboundKeys calls for i in 1..10 { let request = request.clone(); let user_id = request.user_id.clone(); let device_id = device_id.to_string(); let access_token = access_token.to_string(); tokio::spawn(async move { let mut client = get_auth_client( &service_addr::IDENTITY_GRPC.to_string(), user_id, device_id.clone(), access_token, PlatformMetadata::new(PLACEHOLDER_CODE_VERSION, DEVICE_TYPE), ) .await .expect("Couldn't connect to identity service"); let result = client .get_outbound_keys_for_user(request.clone()) .await .unwrap() .into_inner(); let device = result.devices.get(&device_id).unwrap(); println!( "Thread {}: content={:?}, notif={:?}", i, device.one_time_content_prekey, device.one_time_notif_prekey ); }); } println!("Now sleeping"); tokio::time::sleep(Duration::from_secs(3)).await; println!("Done"); }
Before introducing the backoff, was able to reproduce the issue (lowered max retries to 3, localstack is too fast, it was rarely failing with 8).
---- otks stdout ---- Now sleeping Thread 3: content=Some("pVPfPv8GwO6A1nmh9sTCTtgXY1SgTpkg1axzSLfpKYO"), notif=Some("0glisLhk42UpSlSzPvRG1Ni54QOdrDnBiG22MI6c0V8") Thread 4: content=Some("6s8AJVc21lEo5b7O70HlpsHJQI0xTMLNCRhERDBI0fC"), notif=None Thread 7: content=None, notif=Some("pVPfPv8GwO6A1nmh9sTCTtgXY1SgTpkg1axzSLfpKYO") Thread 8: content=Some("oTPvgqPb159Xrc3lcT9AxqTZ0KmhXDQS9BCIm6NkV1Q"), notif=None Thread 2: content=None, notif=None Thread 5: content=None, notif=None Thread 9: content=None, notif=None Thread 1: content=None, notif=Some("6s8AJVc21lEo5b7O70HlpsHJQI0xTMLNCRhERDBI0fC") Thread 6: content=Some("mUlLHPepdsZtVD9FEkwgxuuLJAOO7rwiJrG4nrllsEy"), notif=None Done
Also saw several MaxRetriesExceeded in Identity console
After introducing the backoff, the OTKs were always returned for all threads:
---- otks stdout ---- Now sleeping Thread 7: content=Some("2LCsQHnU7CtVRbmrtK9U4hASqrauEz0xumpoikNX1XH"), notif=Some("Udmu7PoHUkP9VvrcHObwlJ69AwTcdk13OfidsHgzC9l") Thread 3: content=Some("q2AeT2WSq591qLaMP91guNmrPXnWsQp0JQUzKdtnImC"), notif=Some("wCZ558PHRHYinPWGqCxUUE2BnjpesN03IMdLMplV9ii") Thread 5: content=Some("KedtvWnxaeBqZkA0l52H9yZFAPaaMJuyDfgQQNJ1aIn"), notif=Some("u2wNG3GhPzVWaWaeYauSvfed9RR6x8i2zCsRuQMd9lv") Thread 4: content=Some("9LzxGjX47mr3w6MObSG30URQ6LdLOBFOpgmeh6yLn1w"), notif=Some("2LCsQHnU7CtVRbmrtK9U4hASqrauEz0xumpoikNX1XH") Thread 2: content=Some("jhketAygIKVLRWFm454ceAjxAQOQVlZerbGL2mtOP8Y"), notif=Some("q2AeT2WSq591qLaMP91guNmrPXnWsQp0JQUzKdtnImC") Thread 1: content=Some("Z76yL9r8hlKJ0XoVUAV9CENU8IFIoUTWOo74iuHyNK8"), notif=Some("KedtvWnxaeBqZkA0l52H9yZFAPaaMJuyDfgQQNJ1aIn") Thread 6: content=Some("0glisLhk42UpSlSzPvRG1Ni54QOdrDnBiG22MI6c0V8"), notif=Some("9LzxGjX47mr3w6MObSG30URQ6LdLOBFOpgmeh6yLn1w") Thread 8: content=Some("pVPfPv8GwO6A1nmh9sTCTtgXY1SgTpkg1axzSLfpKYO"), notif=Some("jhketAygIKVLRWFm454ceAjxAQOQVlZerbGL2mtOP8Y") Thread 9: content=Some("6s8AJVc21lEo5b7O70HlpsHJQI0xTMLNCRhERDBI0fC"), notif=Some("Z76yL9r8hlKJ0XoVUAV9CENU8IFIoUTWOo74iuHyNK8") Done
Reviewers: varun, kamil
Reviewed By: varun, kamil
Subscribers: ashoat, tomek
Differential Revision: https://phab.comm.dev/D13356