Page MenuHomePhabricator

[tunnelbroker] Rewrite AMQP handling
AbandonedPublic

Authored by bartek on Oct 3 2024, 8:57 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Nov 10, 2:28 PM
Unknown Object (File)
Fri, Nov 8, 10:37 PM
Unknown Object (File)
Fri, Nov 8, 1:49 PM
Unknown Object (File)
Thu, Oct 31, 11:30 AM
Unknown Object (File)
Thu, Oct 31, 12:24 AM
Unknown Object (File)
Tue, Oct 29, 5:39 PM
Unknown Object (File)
Tue, Oct 29, 1:44 AM
Unknown Object (File)
Mon, Oct 28, 6:19 AM
Subscribers

Details

Reviewers
kamil
varun
will
Summary

Big diff - This will be split into smaller ones but just in case this needs to be quickly deployed, leaving this too.
After smaller diffs are created, this will be abandoned.

Depends on D13594

Test Plan
  1. Start tunnelbroker
  2. While running Tunnelbroker commtest tests, restart RabbitMQ (e.g. via docker-compose in another terminal)
  3. Tunnelbroker logs should show reconnect attempts, after they succeed, commtest should pass

Diff Detail

Repository
rCOMM Comm
Branch
barthap/amqp
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

bartek held this revision as a draft.
services/tunnelbroker/src/amqp.rs
153–182

This function can probably be optimized by using asynchronous RwLock from Tokio instead of blocking one from std, and await lock for the whole connection reset.
However, I didn't have time to play with it so leaving it as-is now and will refactor as a follow-up

services/tunnelbroker/src/constants.rs
10

Previously we had one channel per connected device. Rabbit docs say:

Given both of these factors, limiting the number of channels used per connection is highly recommended. As a guideline, most applications can use a single digit number of channels per connection. Those with particularly high concurrency rates (usually such applications are consumers) can start with one channel per thread/process/coroutine and switch to channel pooling when metrics suggest that the original model is no longer sustainable, e.g. because it consumes too much memory.

bartek published this revision for review.EditedOct 3 2024, 9:42 AM

Publishing for review just in case we want to deploy this quickly. I'm sorry it's huge

NOTE: Edit: Created a split stack that replaces this one: D13600 ... D13608. I'll abandon this once added descriptions to these diffs (tomorrow)
services/tunnelbroker/src/websockets/session.rs
759–788

This can be potentially improved by restoring connection and retrying basic_cancel() and queue_delete(), but websocket shouldn't wait for it but close immediately regardless

Replaced with stack from D13600 which also addresses some of my concerns here

All deployed changes are now in the stack of D13600