Page MenuHomePhabricator

[services] [5/n] prefix search websocket server
ClosedPublic

Authored by will on Dec 18 2023, 2:15 PM.
Tags
None
Referenced Files
Unknown Object (File)
Mon, Dec 30, 12:04 AM
Unknown Object (File)
Mon, Dec 30, 12:04 AM
Unknown Object (File)
Mon, Dec 30, 12:04 AM
Unknown Object (File)
Mon, Dec 30, 12:04 AM
Unknown Object (File)
Mon, Dec 30, 12:04 AM
Unknown Object (File)
Mon, Dec 30, 12:04 AM
Unknown Object (File)
Mon, Dec 30, 12:04 AM
Unknown Object (File)
Mon, Dec 30, 12:04 AM
Subscribers

Details

Reviewers
bartek
varun
kamil
Group Reviewers
Restricted Owners Package(Owns No Changed Paths)
Commits
rCOMMc028c2a4f299: [services] [5/n] prefix search websocket server
Summary

This diff implements a Websocket server on the identity service that queries users on the Opensearch domain with prefixes. Currently, these diffs implements just the upgrade and prefix searching. Diffs for authentication are pending testing.

The websocket server is initialized on port 50057 on the identity server.

Depends on D9878

Test Plan

Tested on staging with a custom image. Upgrade successful and prefix search returns a list of user_id, username pairs.

Diff Detail

Repository
rCOMM Comm
Branch
identity-search
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

Owners added a reviewer: Restricted Owners Package.Dec 18 2023, 2:15 PM
will edited the test plan for this revision. (Show Details)
will edited reviewers, added: bartek, varun; removed: Restricted Owners Package.

Looking for some clarification on a few things. Tunnelbroker currently keeps a heartbeat and terminates on timeout. Do we wish to do the same with clients connected to this websocket server? |

Currently, in terms of error handling, I've opted to just use expects similar to Tunnelbroker. The tokio task simply panics with the expect message. I believe that this is alright for now, but wanted to check whether we wanted to use Results instead.

Harbormaster returned this revision to the author for changes because remote builds failed.Dec 18 2023, 2:29 PM
Harbormaster failed remote builds in B25200: Diff 34817!
will requested review of this revision.Dec 18 2023, 10:48 PM
will edited the summary of this revision. (Show Details)
will edited the test plan for this revision. (Show Details)
bartek added a subscriber: kamil.

Not requesting changes to let others see this as well.

General notes:

  • Terraform looks okay ๐Ÿ‘
  • Hyper / upgrade request stuff looks okay (we do the same in Tunnelbroker, I'll move this part to shared code soon)
  • My comment in main.rs is crucial for correct error handling.
  • Open question to other reviewers about port numbers (see inline comments)
  • @kamil could you take a look at the accept_connection() part, especially message and error handling there? You're probably the most experienced person here. I think doing expect() terminates the websocket connection, but I'm unsure about other possible side effects
services/identity/src/main.rs
85โ€“92 โ†—(On Diff #34817)

Look at these comments: https://linear.app/comm/issue/ENG-4795/improve-tunnelbroker-error-handling#comment-e5b7ef52

You probably want the same here. Tunnelbroker has it wrong

services/identity/src/websockets/mod.rs
145โ€“147 โ†—(On Diff #34817)
  1. Should the "OPENSEARCH_ENDPOINT" be in constants.rs?

2, Might be a good idea to use clap: add --opensearch-endpoint CLI flag which handles the env var and default value for you. See config.rs, it's a matter of adding a single field to the struct.

187โ€“189 โ†—(On Diff #34817)

Up to you, might be more readable

200โ€“203 โ†—(On Diff #34817)

Perhaps this is purposeful design idea, I don't know, but I don't see a point here

Why is this results field a list of stringified JSONs? Why do we stringify them again instead of simply returning list of User objects?

I.e. why doing this:

{
  "action": "searchResults",
  "results": ["{\"username\":\"john\",\"user_id\":\"123\"}", "..."]
}

instead of:

{
  "action": "searchResults",
  "results": [
     {
        "username": "john", 
        "user_id": "123"
     },
     { ... }
  ]
}
services/terraform/remote/service_identity.tf
11โ€“12 โ†—(On Diff #34817)

I don't have any strong opinion on this, but what about following Tunnelbroker's port convention? This is:

  • 50051 - grpc
  • 51001 - websocket

Since Identity exposes gRPC on 50054, how about exposing WebSocket on 51004?

14 โ†—(On Diff #34817)

You don't need this one unless you use services_connect_configuration to expose WebSocket to other services

Not requesting changes to let others see this as well.

General notes:

  • Terraform looks okay ๐Ÿ‘
  • Hyper / upgrade request stuff looks okay (we do the same in Tunnelbroker, I'll move this part to shared code soon)
  • My comment in main.rs is crucial for correct error handling.
  • Open question to other reviewers about port numbers (see inline comments)
  • @kamil could you take a look at the accept_connection() part, especially message and error handling there? You're probably the most experienced person here. I think doing expect() terminates the websocket connection, but I'm unsure about other possible side effects

Yeah. The port number of 50057 was chosen arbitrarily as the next free port after 50054. If there's a convention we're using, we should definitely use that.

will retitled this revision from [5/n] prefix search websocket server to [services] [5/n] prefix search websocket server.Dec 19 2023, 8:26 PM
services/terraform/remote/service_identity.tf
11โ€“12 โ†—(On Diff #34817)

that seems reasonable to me

will edited the test plan for this revision. (Show Details)

Update with review feedback. Updated port, select!, config endpoints

Owners added a reviewer: Restricted Owners Package.Dec 21 2023, 11:06 PM

Thanks for addressing my feedback! Mostly LGTM, still letting others view it.

Only my inline question about stringifying nested JSON still remains unanswered

nix/dev-shell.nix
118

Just curious - what is it required for?

services/identity/Cargo.toml
12

Do we need this dependency? It should work without it

Edit: Ahh ok, Tunnelbroker has this one too. This was my mistake. I think it can be removed

services/identity/src/main.rs
38

perhaps you can use BoxedError from src/websocket/mod.rs (just make it pub)

services/identity/src/websockets/mod.rs
200โ€“203 โ†—(On Diff #34817)

Any response here?

will added inline comments.
nix/dev-shell.nix
118

This is seemingly required for cargo to build. I believe the compilation errors began occurring after using the latest version of reqwest. I tried using an older version of reqwest but it was incompatible with our tokio runtime version.

So need to put these in our nix files for everything to build on server and local dev.

services/identity/src/websockets/mod.rs
200โ€“203 โ†—(On Diff #34817)

My bad, forget to respond. So stringifying in my earlier diff was definitely an oversight. I mistakenly believed that serde_json::json! would convert the usernames to objects.

Correct me if I'm wrong, but I believe I have addressed this issue in my last diff update by collecting them as values.

services/identity/src/websockets/mod.rs
193โ€“205

If hit.document() return type is User, and User has #[derive(Serialize)], you can even skip the Value type, the json! should recognize User automatically

200โ€“203 โ†—(On Diff #34817)

Sorry, missed your change. I was just curious if that stringifying was intentional.

One question and one suggestion, otherwise LGTM

services/identity/src/websockets/mod.rs
154

I don't think using expect() in this loop is a good idea, it's probably better to log the error and break the loop to execute outgoing.close() to shut it down gracefully and ensure the connection is closed

183โ€“187

I am not familiar with the entire architecture of this solution, but shouldn't we, instead of ' expect()` to keep the connection open and e.g. send a message to the client that something went wrong? Terminating code and breaking socket connection seems excessive, but maybe that's the intention, I am just asking

services/identity/Cargo.toml
12

FYI: I removed it from tunnelbroker in https://phab.comm.dev/D10444#inline-63663

Error messages intead of panic and other review comments

log reqwest error instead of mapping to SearchError

Any idea why so many of the CI builds are failing?

will marked 8 inline comments as done.

Previous reqwest change didn't include error type change

Any idea why so many of the CI builds are failing?

Error on my part. I made some changes for it to build last night but didn't pick up all my changes with arc diff.

Still working on resolving CommTest build errors. That's still caused by the lambda workflow.

Another note for my reviewers. I decided to implement a function that sends errors back to the websocket client similar to tunnelbroker.

Tunnelbroker's send_error_init_response handles this by taking ownership of the SplitSink named outgoing. This is okay as the tunnelbroker websocket server returns from the loop after the error.

With Kamil's suggestion, we are using continue in the loop instead of breaking from the loop. This meant the function would need to borrow the SplitSink instead. However, this causes errors with the line

tokio::spawn(async move {
  accept_connection(websocket, addr).await;
});

as this future is not Send due to send_error_response taking a mutable reference of the SplitSink (in my first attempt).

Therefore, I decided to use Arc and Mutex. However, another option might have been to forward tasks to the SplitSink with an mpsc channel and clone mpsc Senders to each new error response thread. Just putting this out there in case I missed something or I should use the mpsc solution.

Rebase from lambda build fix

accepting but left a couple comments inline

nix/dev-shell.nix
118

it would be great to understand a little better why we need this dependency

services/identity/src/websockets/mod.rs
192 โ†—(On Diff #35145)

I think using mpsc::channel might be better but i don't have a strong preference

This revision is now accepted and ready to land.Jan 2 2024, 1:27 PM
nix/dev-shell.nix
118

This is currently an open issue on reqwest. There is recent discussion around making this dependency optional but seemingly no work by the maintainer.

https://github.com/seanmonstar/reqwest/issues/2006#issue-1952682498

nix/dev-shell.nix
118

got it, thanks for explaining!

nix/dev-shell.nix
118

Might be good to add a comment to that effect to the code, so we remember to remove it in case the issue is ever resolved

Move errors to their own file for work on [6/n]

This revision was automatically updated to reflect the committed changes.
nix/dev-shell.nix
118

I'm not sure this comment was ever responded to

nix/dev-shell.nix
118

Here's the diff for the comment https://phab.comm.dev/D10668