Details

Reviewers

• karol
tomek

Commits

rCOMMc3089d0e93e1: [services] Tunnelbroker - Add AMQP shared channel locking

Summary

Following comments in D4740 from the synchronization question and forward, ENG-1495 task, we should use a synchronization mechanism in access to the AMQP Client channel which is accessed from the different threads (gRPC threads + DeliveryBroker thread) by calling send() and ack() methods.
When the number of threads grows non-synchronized access to the shared channel causes a segmentation fault or AMQP-CPP TCP Connection buffer error.

This diff introduces the use of C++17 std::scoped_lock which provides a convenient RAII-style mechanism for owning a shared channel.
Usage of the std::scoped_lock was added to the send() and ack() methods which are called from the threads.

Related Linear task: ENG-1495

Test Plan

Successfully pass a stress test in D4768 with multiple threads sending messages at once.

Diff Detail

Repository

rCOMM Comm

Branch

add-mutex-to-amqpmanager

Lint

No Lint Coverage

Unit

No Test Coverage

Event Timeline

• max created this revision.Aug 6 2022, 11:32 AM

• max held this revision as a draft.

Herald added subscribers: • abosh, • karol, atul and 3 others. · View Herald TranscriptAug 6 2022, 11:32 AM

• max added parent revisions: D4740: [services] Tunnelbroker - Wrap `connect()` into `init()` in AmqpManager, D4741: [services] Tunnelbroker - Add `waitUntilReady` function in AmqpManager, D4742: [services] Tunnelbroker - Fix `amqpReady` assignment.Aug 6 2022, 11:33 AM

Mutex include was added to the header file.

Harbormaster completed remote builds in B11191: Diff 15399.Aug 6 2022, 11:44 AM

• max edited the summary of this revision. (Show Details)Aug 6 2022, 11:47 AM

• max edited the test plan for this revision. (Show Details)

Harbormaster completed remote builds in B11192: Diff 15400.Aug 6 2022, 11:49 AM

• max added a child revision: D4743: [services] Tunnelbroker - Refactor handler names in Amqp `connectInternal()`.Aug 6 2022, 11:50 AM

• max added a child revision: D4744: [services] Tunnelbroker - Fix AMQP client reconnection algorithm.

• max added a child revision: D4746: [services] Tunnelbroker - Changes in Amqp messages testing to send and listen in different threads.Aug 6 2022, 11:59 AM

• max added a child revision: D4749: [services] Tunnelbroker - Add timeout for a pop message waiting in AmqpManager tests.

• max edited the summary of this revision. (Show Details)Aug 6 2022, 12:02 PM

Name for a scoped_lock was added.

Harbormaster completed remote builds in B11197: Diff 15406.Aug 6 2022, 12:54 PM

• max added a child revision: D4768: [services] Tunnelbroker - Amqp Manager in parallel threads messages throughput test.Aug 6 2022, 1:06 PM

• max edited the test plan for this revision. (Show Details)

• max added reviewers: • karol, tomek.

• max published this revision for review.Aug 6 2022, 1:08 PM

• max edited the summary of this revision. (Show Details)

• max mentioned this in D4740: [services] Tunnelbroker - Wrap `connect()` into `init()` in AmqpManager.Aug 6 2022, 1:13 PM

I'm slightly confused now. After this diff we can access the amqpChannel only from a single thread at a time. Is it going to degrade the performance? If the channel has to be accessed only from a single thread, maybe we should make it thread local and have an instance for each thread?

services/tunnelbroker/src/Amqp/AmqpManager.cpp
135 ↗	(On Diff #15406)	I guess we should use curly brackets initialization

This revision now requires changes to proceed.Aug 10 2022, 5:44 AM

This diff introduces the use of C++17 std::scoped_lock

Does our compilation process support C++17?

In D4767#138282, @tomek wrote:

I'm slightly confused now. After this diff we can access the amqpChannel only from a single thread at a time. Is it going to degrade the performance? If the channel has to be accessed only from a single thread, maybe we should make it thread local and have an instance for each thread?

This degrades performance but in a stress test, in D4768 it shows that we have enough throughput of nearly 2000 messages per second during send/receive in this blocking model. We should consider using a pool of connections which is a common practice. This is a future task for the next version ENG-1562 because creating and managing a pool of connections will take no small amount of time and the current performance looks ok for our needs at the moment.

have an instance for each thread?

Creating a new instance with a separate TCP connection/channel to the AMQP server on each gRPC thread is a bad practice and will cause flooding in TCP connections and a network stack. RabbitMQ servers also have a limit of 100 connections. Creating a pool of connections and using a pool in ENG-1562 is a best solution not to flood the TCP connection and not to block on the single one.

In D4767#138286, @tomek wrote:

This diff introduces the use of C++17 std::scoped_lock

Does our compilation process support C++17?

We are using C++17 in other services as well. We have these changes a long time ago and the compiler supports C++17.

Change to curly braces initialization.

• max marked an inline comment as done.Aug 10 2022, 6:02 AM

• max added inline comments.

services/tunnelbroker/src/Amqp/AmqpManager.cpp
135 ↗	(On Diff #15406)	I guess we should use curly brackets initialization Sure! Thanks.

Harbormaster completed remote builds in B11272: Diff 15509.Aug 10 2022, 6:12 AM

Does our compilation process support C++17?

I added C++17 support for clang+MacOS in https://phab.comm.dev/D4766

This degrades performance but in a stress test, in D4768 it shows that we have enough throughput of nearly 2000 messages per second during send/receive in this blocking model.

the current performance looks ok for our needs at the moment.

What is the estimate for the throughput we should be able to handle? Could you explain how it was estimated that the current performance is ok?

Even if the throughput is high enough, blocking model should be replaced soon, because one slow operation can block all the others, and that isn't acceptable, so I think prioritizing ENG-1562 might be a good idea @ashoat

Accepting, as we have a ticket to handle the improvement.

This revision is now accepted and ready to land.Aug 11 2022, 4:55 AM

Agree with prioritizing ENG-1562. It's slated for the next release of Tunnelbroker right now (0.5)

In D4767#138735, @tomek wrote:

This degrades performance but in a stress test, in D4768 it shows that we have enough throughput of nearly 2000 messages per second during send/receive in this blocking model.

the current performance looks ok for our needs at the moment.

What is the estimate for the throughput we should be able to handle?

I have a full description of the throughout test in ENG-1562:

...It's pretty enough throughput shown by the stress test from D4768 (100 threads sending 10 messages in parallel) which is around 2000 messages per second in a testing sandbox environment:

MacBook M1, Docker limits: CPU 4 cores, RAM 3 GB.
Docker container with the Tunnelbroker, compiler in debug mode.
Docker container with the RabbitMQ image.

Could you explain how it was estimated that the current performance is ok?

From my previous experiences, 2000 requests per second is a good number as a starting point. Not sure the other infrastructure services like DynamoDB instance can handle more for one connection.

Even if the throughput is high enough, blocking model should be replaced soon, because one slow operation can block all the others, and that isn't acceptable, so I think prioritizing ENG-1562 might be a good idea @ashoat

The blocking model in a certain scenario will not be a bottleneck. Because of message broker throughput is very very high compared to the databases. If there is a blocking connection to the message broker - actually something not good happened like the message broker can't handle the messages at all. The current blocking model in the message broker client will be a latency bottleneck, the latency will grow along with the number of messages per second - that's true. Also, we have a dynamodb database which I'm sure will be a much bigger bottleneck in our scenario.

Agree with prioritizing ENG-1562. It's slated for the next release of Tunnelbroker right now (0.5)

Looking at the 0.5 backlog it's a good idea to prioritize it.

Merge on parent changes.

This revision was landed with ongoing or failed builds.Aug 22 2022, 7:04 AM

Closed by commit rCOMMc3089d0e93e1: [services] Tunnelbroker - Add AMQP shared channel locking (authored by • max). · Explain Why

This revision was automatically updated to reflect the committed changes.

• max added a commit: rCOMMc3089d0e93e1: [services] Tunnelbroker - Add AMQP shared channel locking.

Harbormaster completed remote builds in B11501: Diff 15817.Aug 22 2022, 7:06 AM

• max removed a child revision: D4768: [services] Tunnelbroker - Amqp Manager in parallel threads messages throughput test.Aug 24 2022, 5:35 AM

[services] Tunnelbroker - Add AMQP shared channel locking
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 15399

services/tunnelbroker/src/Amqp/AmqpManager.h

services/tunnelbroker/src/Amqp/AmqpManager.cpp

[services] Tunnelbroker - Add AMQP shared channel lockingClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 15399

services/tunnelbroker/src/Amqp/AmqpManager.h

services/tunnelbroker/src/Amqp/AmqpManager.cpp

[services] Tunnelbroker - Add AMQP shared channel locking
ClosedPublic
Actions

Revision Contents
Changeset List