Page MenuHomePhabricator

[identity] CompareUsers RPC
ClosedPublic

Authored by varun on Feb 21 2023, 9:20 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Apr 17, 3:59 AM
Unknown Object (File)
Wed, Apr 17, 3:59 AM
Unknown Object (File)
Wed, Apr 17, 3:59 AM
Unknown Object (File)
Wed, Apr 17, 3:59 AM
Unknown Object (File)
Wed, Apr 17, 3:58 AM
Unknown Object (File)
Wed, Apr 17, 3:53 AM
Unknown Object (File)
Mon, Apr 8, 2:31 PM
Unknown Object (File)
Feb 28 2024, 6:14 PM
Subscribers

Details

Summary

This RPC takes a list of users from the keyserver and returns any users in DynamoDB that are not in that list of users

One use case for this RPC is to ensure data completeness and correctness before we make Identity service the single source of truth

Adding ashoat since we're introducing a new RPC

Test Plan

this will eventually be called from a cronjob, but I simulated this by adding a compare_users client method to the keyserver and
calling the RPC with:

  1. an empty list -> got back the full list of DDB user IDs
  2. exactly the same list of users as in my DDB test table -> got back an empty list
  3. a list of random IDs not found in DDB -> got back the full list of DDB user IDs

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

varun requested review of this revision.Feb 21 2023, 9:36 PM

Resigning to avoid removing from reviewer's queue, but feel free to re-add me later

shared/protos/identity.proto
21–22 ↗(On Diff #22909)

How about the opposite? Any user IDs that are NOT in DynamoDB but ARE in the supplied list

It will be harder to "recover" in those scenarios, but I think it's still something we should be aware of and log

jon requested changes to this revision.Feb 22 2023, 3:26 PM
jon added inline comments.
services/identity/src/service.rs
242 ↗(On Diff #22909)

Since we are only calling contains() on this collection, we should probably use a HashSet which has O(1) .contains() instead of O(n)

247–252 ↗(On Diff #22909)

Using HashSets, this is a bit cleaner and quite a bit more performant ( O(n) vs O(n2) ).

This will give the users which exist in DDB, but not in mysql. Which I think isn't right, but is equal to the existing code.

248–252 ↗(On Diff #22909)

I think you want the users that are in keyserver, but not currently in DDB. Even more full proof would be to check if there's a missing user in either list. As a deletion or addition may have been missed.

shared/protos/identity.proto
21–22 ↗(On Diff #22909)

The current logic only supplies the DDB users which do not exist in mariadb.

for user in ddb_users {
  if !mysql_users.contains(&user) {
    response.push(user)
  }
}
This revision now requires changes to proceed.Feb 22 2023, 3:26 PM

switched to HashSet and fixed code logic, changed API to return <users in keyserver but not in identity> AND <users in identity but not in keyserver>

services/identity/src/service.rs
265–266 ↗(On Diff #23025)

This is similar to what HashSet's difference() method is doing under the hood, but converting to a Difference and then back to a Vec was messy, so I opted to modify the Vecs in place

update RPC description in .proto file

This revision is now accepted and ready to land.Feb 25 2023, 1:07 PM
This revision was automatically updated to reflect the committed changes.