Page MenuHomePhabricator

[services] Tunnelbroker - Amqp Manager in parallel threads messages throughput test
ClosedPublic

Authored by max on Aug 6 2022, 1:05 PM.
Tags
None
Referenced Files
F3771098: D4768.id15420.diff
Sun, Jan 12, 12:58 PM
F3771061: D4768.id15670.diff
Sun, Jan 12, 12:24 PM
Unknown Object (File)
Sat, Jan 11, 3:19 PM
Unknown Object (File)
Thu, Dec 26, 4:45 AM
Unknown Object (File)
Thu, Dec 26, 4:45 AM
Unknown Object (File)
Wed, Dec 25, 11:09 PM
Unknown Object (File)
Thu, Dec 19, 11:56 PM
Unknown Object (File)
Thu, Dec 19, 11:56 PM

Details

Summary

This diff introduces AMQP Manager and AMQP-C++ client implementation multithreaded access test. In this test, we are spawning 100 threads that send 10 messages each in parallel. In this case, we are checking for possible deadlocks and unsynchronized access to the shared resources inside the AMQPManager implementation.

Also, this test can show the throughput performance of the single shared AMQP channel implementation to send/receive messages.

Related Linear task: ENG-1495

Test Plan

Run yarn run-unit-tests tunnelbroker command and test is successfully passed.

Diff Detail

Repository
rCOMM Comm
Branch
add-amqp-stess-test
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

max held this revision as a draft.
karol added inline comments.
services/tunnelbroker/test/AmqpManagerTest.cpp
138 ↗(On Diff #15411)

Why detach? I think we shouldn't do that.

This revision now requires changes to proceed.Aug 8 2022, 4:57 AM

Switch to use join() instead of detach().

max added inline comments.
services/tunnelbroker/test/AmqpManagerTest.cpp
138 ↗(On Diff #15411)

Why detach? I think we shouldn't do that.

There is no difference here between join and detach because detached threads will be exited too at the end of the test application. It makes sense to distinguish in a long-running app-like servers and where we should control the lifetime of the threads.
Anyway, I don't mind changing it to the join().

karol added inline comments.
services/tunnelbroker/test/AmqpManagerTest.cpp
138 ↗(On Diff #15411)

There is no difference here between join and detach

They're quite different, I'd say.

detached threads will be exited too at the end of the test application

Will they? Do we have control over how far with the execution they'd get?

A quick test:

#include <iostream>

#include <thread>
#include <chrono>

using namespace std;

int main()
{
  cout << "MAIN BEGIN" << endl;
  thread th([](){
    cout << "TH BEGIN" << endl;
    std::this_thread::sleep_for(std::chrono::milliseconds(500));
    cout << "TH END" << endl;
  });
  th.detach();
  std::this_thread::sleep_for(std::chrono::milliseconds(100));
  cout << "MAIN END" << endl;
}

The output is:

MAIN BEGIN
TH BEGIN
MAIN END

Does the thread end somewhere in the background? Maybe. Do we have control over it? I don't think so, we just let it fly, right?

Now, what you did with the latest update isn't a good solution either I think. Joining every thread right after spawning them kills the whole purpose of using threads. We could just execute the code sequentially.

I think the best practice is to let them run in the background and join in the end so we're sure they're finished. Something like this:

#include <iostream>

#include <thread>
#include <chrono>

using namespace std;

int main()
{
  cout << "MAIN BEGIN" << endl;
  thread th([](){
    cout << "TH BEGIN" << endl;
    std::this_thread::sleep_for(std::chrono::milliseconds(500));
    cout << "TH END" << endl;
  });
  std::this_thread::sleep_for(std::chrono::milliseconds(100));
  cout << "MAIN END" << endl;
  th.join();
}
This revision now requires changes to proceed.Aug 8 2022, 7:21 AM
max marked an inline comment as done.

Remove threads joining.

services/tunnelbroker/test/AmqpManagerTest.cpp
138 ↗(On Diff #15411)

Now, what you did with the latest update isn't a good solution either I think. Joining every thread right after spawning them kills the whole purpose of using threads. We could just execute the code sequentially.

That makes sense.

I think the best practice is to let them run in the background and join in the end so we're sure they're finished. Something like this:

The best and the easiest way here according to the std::thread documentation is just don't use detach or join and just create them by the constructor.

We can join them at the end but looking that this is a short live running app (just test) we can omit this instead of adding an additional code to the simple test.

Thanks @kaIor I've removed the joining.

karol added inline comments.
services/tunnelbroker/test/AmqpManagerTest.cpp
138 ↗(On Diff #15411)

The best and the easiest way here according to the std::thread documentation is just don't use detach or join and just create them by the constructor.

Sorry, but I don't understand what using join/detach has to do with how we spawn threads.

We can join them at the end but looking that this is a short live running app (just test) we can omit this instead of adding an additional code to the simple test.

I still don't understand why we cannot simply join here in the end.

but looking that this is a short live running app (just test) we can omit this

How is complexity related to correctness? Even the simples program should be as safe as possible and correctly written.

I still think we should join in the end.

Unjoined threads often cause crashes (not sure if always). Please, read this SO answer (which BTW includes yet one more argument against using detach).

This revision now requires changes to proceed.Aug 16 2022, 6:52 AM

Joining message sending threads at the end of the test execution.

max added inline comments.
services/tunnelbroker/test/AmqpManagerTest.cpp
138 ↗(On Diff #15411)

I still think we should join in the end.

I'm sure it's 100% overengineering here, but it's easier to add it neither we'll spend a lot of additional time pushing each other, especially on a test...
I'm always good on good )

I've updated the diff with the joining of the threads at the end of the test execution.
Looks good now ;)

karol added inline comments.
services/tunnelbroker/test/AmqpManagerTest.cpp
138 ↗(On Diff #15411)

Ok, cool. As a side note, it would be cool if you responded to my questions/doubts directly. Actually, none of them were addressed.

max added inline comments.
services/tunnelbroker/test/AmqpManagerTest.cpp
138 ↗(On Diff #15411)

The best and the easiest way here according to the std::thread documentation is just don't use detach or join and just create them by the constructor.

Sorry, but I don't understand what using join/detach has to do with how we spawn threads.

The std::thread spawns on a constructor, but we can also detach or join it. When we are joining the thread, the parent thread should wait until the tread completes while detaching completely detaching the thread and making it independent of the parent. The detached thread will end when the thread ends itself or when the main exits (including fatal errors).

We can join them at the end but looking that this is a short live running app (just test) we can omit this instead of adding an additional code to the simple test.

I still don't understand why we cannot simply join here in the end.

We can, but does it makes sense in this certain scenario? Ask yourself: how long the detached thread can live in a running test? Not long, because the test is not a long-running app like server and etc. The detached thread will die after all the tests are done or fail. The thread will be killed automatically without the need to join it.

but looking that this is a short live running app (just test) we can omit this

How is complexity related to correctness? Even the simples program should be as safe as possible and correctly written.

Following this scenario, we can implement additional safety checks that will never occur in a certain scenario/program and cover this to make a program as safe as possible. But ask yourself: what the problem detached thread in a running short-lived test can cause? Memory overflow: not in this case, segfault: not in this case, "ghosts" threads: not in this case.... so why does the developer needs to spend time to protect from the scenario which will never happen in a certain condition?

There is a tradeoff between additional complexity and necessity. We should always ask ourselves if this part of the code is really necessary? Because when we are working on a team there is a big possibility that someone else will be figuring out the code and every line of the code will take the time to figure out why it is here. And the next developer will ask: why you should join the threads if they die soon at the end of the tests? It took me time to figure out which is expensive. I respect your thoughts it is super correct! But we should ask ourselves is it really necessary to add additional complexity here?... That's my thoughts.

Unjoined threads often cause crashes (not sure if always).

That sounds like: mutexes always cause deadlocks, and shared data access causes race conditions... No, it doesn't. We should not throw anything because it needs to be taken with additional care, we just need to know that we should avoid it if it's not really necessary and take it with additional care.

I'm not against @karol your comments, they were very useful! I'm against adding complexity to simple things when it's really don't need to be.

Thanks, @karol ;) passing to @tomek to review.

tomek requested changes to this revision.Aug 17 2022, 7:19 AM

In this test, we are spawning 100 threads that send 10 messages each in parallel.

This isn't exactly correct, even if it seems like that. The problem is that we create threads and start sending messages immediately, so it could happen that all the messages from the first thread will be sent before the last thread even started. The solution is to use some synchronization, so we start sending messages only when all the threads are created. This might test the service better and will give us more accurate performance measurements.

services/tunnelbroker/test/AmqpManagerTest.cpp
144–148 ↗(On Diff #15670)

This assertion isn't really useful. If less that MESSAGES_NUMBER * THREADS_NUMBER messages are delivered, the loop will hang. If more that MESSAGES_NUMBER * THREADS_NUMBER then we will ignore them because the loop will be finished. So we need a way to verify that after MESSAGES_NUMBER * THREADS_NUMBER iterations there are no more messages.

This revision now requires changes to proceed.Aug 17 2022, 7:19 AM
services/tunnelbroker/test/AmqpManagerTest.cpp
144–148 ↗(On Diff #15670)

This isn't exactly correct, even if it seems like that. The problem is that we create threads and start sending messages immediately, so it could happen that all the messages from the first thread will be sent before the last thread even started. The solution is to use some synchronization, so we start sending messages only when all the threads are created. This might test the service better and will give us more accurate performance measurements.

I agree with you here @tomek ! But is this not too complex for a test?

In the past, we have an agreement that tests must be not complex and not so time-consuming... so the main question: is this (additional synchronization etc...) not too much for a test?

max marked an inline comment as done.
tomek requested changes to this revision.Aug 19 2022, 7:21 AM
tomek added inline comments.
services/tunnelbroker/test/AmqpManagerTest.cpp
144–148 ↗(On Diff #15670)

If the additional synchronization is time-consuming, we can skip it (I guess it should be rather easy, though). But checking if there are no more messages after the for loop is a lot more important, so we should have it (unless it's really complicated).

This revision now requires changes to proceed.Aug 19 2022, 7:21 AM
max marked an inline comment as done.
max added inline comments.
services/tunnelbroker/test/AmqpManagerTest.cpp
144–148 ↗(On Diff #15670)

If the additional synchronization is time-consuming, we can skip it (I guess it should be rather easy, though).

To inform threads to start together we can use a conditional variable. The test now is complicated itself and using mutex and CV will add an additional complexity here with a little improvement.

But checking if there are no more messages after the for loop is a lot more important, so we should have it (unless it's really complicated).

This is a good idea! But the problem here is that DeliveryBroker::getInstance().pop(toDeviceID) is a blocking read and it will wait indefinitely if no more new messages.

tomek added inline comments.
services/tunnelbroker/test/AmqpManagerTest.cpp
144–148 ↗(On Diff #15670)

This is a good idea! But the problem here is that DeliveryBroker::getInstance().pop(toDeviceID) is a blocking read and it will wait indefinitely if no more new messages.

That's correct, but MPMCQueue is used underneath and it should be rather easy to use isEmpty method that it has. Nevertheless, if that's too complicated, we can leave without it, but not checking this reduces usefulness of this test significantly.

This revision is now accepted and ready to land.Aug 19 2022, 10:58 AM
max marked an inline comment as done.

Check for an isEmpty was added.

max added inline comments.
services/tunnelbroker/test/AmqpManagerTest.cpp
144–148 ↗(On Diff #15670)

This is a good idea! But the problem here is that DeliveryBroker::getInstance().pop(toDeviceID) is a blocking read and it will wait indefinitely if no more new messages.

That's correct, but MPMCQueue is used underneath and it should be rather easy to use isEmpty method that it has. Nevertheless, if that's too complicated, we can leave without it, but not checking this reduces usefulness of this test significantly.

Omg, sorry I've missed that we have exposed the isEmpty method in a DeliverBroker. So that's easy, I've added this check. Thanks @tomek !

max marked an inline comment as done.

Fixing space.

services/tunnelbroker/test/AmqpManagerTest.cpp
149–151 ↗(On Diff #15794)

Just one thought: what do you think about joining the threads before we check if all the messages are delivered?

144–148 ↗(On Diff #15670)

Great, thanks!

Rebase on master changes, move threads joining before checks.

max added inline comments.
services/tunnelbroker/test/AmqpManagerTest.cpp
149–151 ↗(On Diff #15794)

Just one thought: what do you think about joining the threads before we check if all the messages are delivered?

Yes, we can do this. I've moved threads joining before checks. Thanks, @tomek.