Page MenuHomePhabricator

[Identity] Fix opaque login
ClosedPublic

Authored by jon on Jun 30 2023, 3:32 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Dec 15, 4:00 AM
Unknown Object (File)
Sun, Dec 15, 4:00 AM
Unknown Object (File)
Sun, Dec 15, 4:00 AM
Unknown Object (File)
Sun, Dec 15, 4:00 AM
Unknown Object (File)
Sun, Dec 15, 4:00 AM
Unknown Object (File)
Sun, Dec 15, 3:59 AM
Unknown Object (File)
Sun, Dec 15, 3:45 AM
Unknown Object (File)
Sat, Nov 30, 6:44 PM
Subscribers

Details

Summary

The DDB query operation coupled with KeyConditionExpression
will project only the PartitionKey and the conditional attribute that
you filtered by. So the resulting query would only return username and
user_id for a username query.

Re-querying by partition_key allows for the entirity of the object to be
returned.

Depends on D8400

Test Plan

Login with password.
Tested by Keyserver login diff later on.

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

The DDB query operation coupled with KeyConditionExpression
will project only the PartitionKey and the conditional attribute that
you filtered by.

I think it's more about the index projection than the KeyConditionExpression. It is now defined as projection_type = "KEYS_ONLY" in terraform.

services/identity/src/database.rs
541 ↗(On Diff #28318)

IMO awaiting inside calls/parentheses looks a bit messy. We don't do it in JS either

This revision is now accepted and ready to land.Jun 30 2023, 11:08 PM
jon marked an inline comment as done.

Address feedback

should we just change the index projection to INCLUDE or ALL?

should we just change the index projection to INCLUDE or ALL?

I'm not sure. DDB is outside my expertise. CC @bartek

This revision now requires review to proceed.Jul 6 2023, 11:33 AM

It depends on how often we query by index (UserInfo a.k.a username) vs normal table primary key (userID) vs how many columns (attributes) the column has in total.

  • Projecting ALL will technically duplicate the whole table in an index (2x storage).
  • Querying a keys-only index and then re-query the table (done in this diff) consumes another RCU (read capacity unit) because we do 2 read requests instead of one.

In our case:

  • If we query the users table often and/or it has only few columns, changing projection to ALL makes sense, because storage costs will be lower than query costs
  • If the query is comparatively rare (to querying by id), then we can leave it as is.
  • If query-by-username requires only some additional columns, not all, the INCLUDE is the way to go

If we added some monitoring (idk if aws has built-in tools precise enough), we can see in practice how it scales and change the approach later (note that it will need a migration: 1. create a new index, 2. switch code to use it, 3. delete the old one)

I suspect that query-by-username will be a common flow. Here are the situations in which it will happen:

  1. When a user registers an account, we need to check if their selected username is already in use
  2. When a user searches Comm, we need to search usernames
    • Current interface: chat composer. This will stay
    • Current interface: friend list and block list. This will stay
    • Current interface: search in chat list. This will be replaced by the next one
    • Future interface: global search

Additionally, it's important to call out that we will need "prefix search" for some of these use cases. Eg. we should be able to return "ashoat" if the user searches for "ash".

It may be the case that DDB is not a good option for solving this problem. I haven't thought about it deeply. The tasks for this are ENG-4131 and ENG-4132

I suspect that query-by-username will be a common flow. Here are the situations in which it will happen:

  1. When a user registers an account, we need to check if their selected username is already in use

Okay, registration is common, but request rates still will be relatively low (we don't expect hundreds of registrations per minute)

  1. When a user searches Comm, we need to search usernames
    • Current interface: chat composer. This will stay
    • Current interface: friend list and block list. This will stay
    • Current interface: search in chat list. This will be replaced by the next one
    • Future interface: global search

Additionally, it's important to call out that we will need "prefix search" for some of these use cases. Eg. we should be able to return "ashoat" if the user searches for "ash".

It may be the case that DDB is not a good option for solving this problem. I haven't thought about it deeply. The tasks for this are ENG-4131 and ENG-4132

Yeah, prefix search is DDB's weak part - we can do it only by "sort keys" which is not the case here. We would probably need a separate search engine for this purpose.

I think we can just stick with Jon's current approach here

For the searching usernames piece, I think we'll have to use AWS CloudSearch or Elasticsearch on top of DDB. https://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-dynamodb-data.html

In D8402#249107, @varun wrote:

I think we can just stick with Jon's current approach here

For the searching usernames piece, I think we'll have to use AWS CloudSearch or Elasticsearch on top of DDB. https://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-dynamodb-data.html

Yep, makes sense

This revision is now accepted and ready to land.Jul 7 2023, 11:42 AM
services/identity/src/database.rs
19 ↗(On Diff #28449)

i think handle_db_error can be removed here

jon added inline comments.
services/identity/src/database.rs
19 ↗(On Diff #28449)

yea, you're right, initially used it but ended up just using `DBItemError instead

jon marked an inline comment as done.

Rebase on master

This revision was landed with ongoing or failed builds.Jul 16 2023, 7:39 PM
This revision was automatically updated to reflect the committed changes.