Page MenuHomePhabricator

[services] Tunnelbroker - Handle SessionID collision without erroring, using random device to generate it.
ClosedPublic

Authored by max on Feb 3 2022, 7:02 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Jun 19, 2:38 PM
Unknown Object (File)
Wed, Jun 19, 2:37 PM
Unknown Object (File)
Wed, Jun 19, 2:37 PM
Unknown Object (File)
Wed, Jun 19, 2:37 PM
Unknown Object (File)
Wed, Jun 19, 2:37 PM
Unknown Object (File)
Wed, Jun 19, 2:37 PM
Unknown Object (File)
Wed, Jun 19, 2:35 PM
Unknown Object (File)
Wed, Jun 19, 2:35 PM

Details

Summary

This is the fix for the follow-up task ENG-621

Context from the task:

  • At the moment for sessionID creation, we rely on boost::uuids::random_generator() which instance creates every gRPC handler request that affects the random generator internal counter. We need to fix this by moving the generator out of the function or not using an internal counter.

Solution:
By default boost::uuids::random_generator() uses mt19937 which is not cryptographically secure. By switching to use std::random_device we will produce enough secure results because it uses /dev/(u)random and no need to move it outside the function.

  • When the unique SessionID has a collision with the already existing in the Database we are falling into the error instead of generating a new one.

Solution:
In case we already have a record with the same sessionID in the database we generate a new sessionID until we have not the record with the same sessionID (in case of multiple collisions).

Dependencies
Depends on D3124

Test Plan
  1. Does the session successfully created in case we have no collision.
  2. Does the session creation recover when we have a collision (by adding to the database record with the same sessionID).

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

max edited the test plan for this revision. (Show Details)
max added a reviewer: tomek.
tomek requested changes to this revision.Feb 4 2022, 2:28 AM
tomek added inline comments.
services/tunnelbroker/docker-server/contents/server/src/Service/TunnelbrokerServiceImpl.cpp
82–83 ↗(On Diff #9274)

I don't think this is the best approach. If you check https://en.cppreference.com/w/cpp/numeric/random/random_device, there's a note:

note: demo only: the performance of many
implementations of random_device degrades sharply
once the entropy pool is exhausted. For practical use
random_device is generally only used to seed
a PRNG such as mt19937

So in our case we need to construct an instance of mt19937 generator, seed it with the result of random_device and store as an instance variable. The current approach might have serious performance issues when the entropy is depleted

94–96 ↗(On Diff #9274)

It is technically a collision, but not something which we should worry about. If you think we need to log it, I would at least delete warning.

This revision now requires changes to proceed.Feb 4 2022, 2:28 AM

Moved uuid generation to Tools and separate function.

max added inline comments.
services/tunnelbroker/docker-server/contents/server/src/Service/TunnelbrokerServiceImpl.cpp
82–83 ↗(On Diff #9274)

I don't think this is the best approach. If you check https://en.cppreference.com/w/cpp/numeric/random/random_device, there's a note:

note: demo only: the performance of many
implementations of random_device degrades sharply
once the entropy pool is exhausted. For practical use
random_device is generally only used to seed
a PRNG such as mt19937

So in our case we need to construct an instance of mt19937 generator, seed it with the result of random_device and store as an instance variable. The current approach might have serious performance issues when the entropy is depleted

Moved generation code to the generateUUID function in Tools, using the suggested approach to generate it using seeded mt19937 and make them thread_local static.

94–96 ↗(On Diff #9274)

It is technically a collision, but not something which we should worry about. If you think we need to log it, I would at least delete warning.

Yes, but we definitely need to know this in server logs. If these collisions will repeatedly occur that will indicate that something is wrong or needs to be improved in a random generation.

tomek added inline comments.
services/tunnelbroker/docker-server/contents/server/src/Tools/Tools.cpp
58 ↗(On Diff #9308)

We should probably use just thread_local to be consistent with the codebase. Also in https://en.cppreference.com/w/cpp/language/storage_duration it is said that

It can be combined with static or extern to specify internal or external linkage (except for static data members which always have external linkage), respectively, but that additional static doesn't affect the storage duration.

So we don't need to specify the static.

This revision is now accepted and ready to land.Feb 7 2022, 1:49 AM
This revision now requires review to proceed.Feb 7 2022, 1:49 AM

Please address the thread_local static issue. What is it for?

ashoat added a reviewer: jim.

Thanks @palys-swm for the thoughtfulness around usecases for PRNGs. This situation is the opposite of the issue we had in October, where we were using a PRNG when we shouldn't have. Here it makes perfect sense to use a PRNG to avoid depleting the entropy pool.

It's not extremely clear to me how necessary this change is (what's the probability of a collision occurring, anyways?) but I guess it doesn't hurt to have this code.

This revision is now accepted and ready to land.Feb 7 2022, 6:54 PM

I don't understand the problem here.

boost::uuids::random_generator is secure and uses OS entropy (/dev/urandom) according to the docs. And the synchronization isn't an issue because in the current version of the code all instances are local, not global/static. Are you seeing collisions in practice? If used correctly, that should *never* happen statistically. The only reason not to use random_generator and random_device would be performance, but that doesn't seem like a likely issue.

ashoat requested changes to this revision.Feb 7 2022, 8:46 PM

Synced with @jimpo offline and he offered some clarification. Seems like the use cases for calling a PRNG such as mt19937 directly from C++ in 2022 are very limited, and we should generally be relying on the CSPRNG from the OS instead.

Screen Shot 2022-02-07 at 11.42.09 PM.png (1×2 px, 821 KB)

services/tunnelbroker/docker-server/contents/server/src/Service/TunnelbrokerServiceImpl.cpp
80–89 ↗(On Diff #9308)

We should probably just leave this as-is, and undo the changes

services/tunnelbroker/docker-server/contents/server/src/Tools/Tools.cpp
15–16 ↗(On Diff #9308)

We should probably switch to using something that sources from /dev/urandom here too?

This revision now requires changes to proceed.Feb 7 2022, 8:46 PM

For session keys and things like that, std::random_device is fine. If perf is an issue we could make a using Crypto++ primitives.

services/tunnelbroker/docker-server/contents/server/src/Service/TunnelbrokerServiceImpl.cpp
81 ↗(On Diff #9308)

Use boost::uuids::to_string here instead. https://www.boost.org/doc/libs/1_78_0/libs/uuid/doc/uuid.html#to_string

It's more clear IMO.

services/tunnelbroker/docker-server/contents/server/src/Tools/Tools.cpp
15–16 ↗(On Diff #9308)

Yeah, just drop the mt19937 generator.

std::random_device generator;

Making it thread-local might help performance a little bit so it keeps the same file descriptor.

max marked 2 inline comments as done.

Update random generator to drop mt19937.

max added inline comments.
services/tunnelbroker/docker-server/contents/server/src/Service/TunnelbrokerServiceImpl.cpp
80–89 ↗(On Diff #9308)

We should probably just leave this as-is, and undo the changes

I think we should stay with the generateUUID() as a function in Tools. The function code is small but it will be reused in different places. Now we used it once in sessions, but I plan to use it to generate message IDs, and maybe in future updates too. In case we want to change something in the generator it would be much easier to refactor and don't repeat the same code right now.

81 ↗(On Diff #9308)

Use boost::uuids::to_string here instead. https://www.boost.org/doc/libs/1_78_0/libs/uuid/doc/uuid.html#to_string

It's more clear IMO.

Already used it on the updates, find it after the first update, but thanks anyway ;) we don't need the cast here.

services/tunnelbroker/docker-server/contents/server/src/Tools/Tools.cpp
15–16 ↗(On Diff #9308)

We should probably switch to using something that sources from /dev/urandom here too?

Yeah, just drop the mt19937 generator.

All that makes sense to me so switched to std::random_device here.

58 ↗(On Diff #9308)

We should probably use just thread_local to be consistent with the codebase. Also in https://en.cppreference.com/w/cpp/language/storage_duration it is said that

It can be combined with static or extern to specify internal or external linkage (except for static data members which always have external linkage), respectively, but that additional static doesn't affect the storage duration.

So we don't need to specify the static.

Switched to thread_local only.

In D3103#82115, @karol-bisztyga wrote:

Please address the thread_local static issue. What is it for?

thread_local is used to refer to data that is seemingly global or static storage duration (from the viewpoint of the functions using it) but in actual fact, there is one copy per thread.

In D3103#82475, @jimpo wrote:

I don't understand the problem here.

boost::uuids::random_generator is secure and uses OS entropy (/dev/urandom) according to the docs. And the synchronization isn't an issue because in the current version of the code all instances are local, not global/static. Are you seeing collisions in practice? If used correctly, that should *never* happen statistically. The only reason not to use random_generator and random_device would be performance, but that doesn't seem like a likely issue.

I agree that collision in our case looks almost impossible. In the case of performance, we create sessions very rarely, and even if we have thousands of devices the new session requests will be rare.

services/tunnelbroker/docker-server/contents/server/src/Service/TunnelbrokerServiceImpl.cpp
81 ↗(On Diff #9423)

I say skip the database lookup and the while loop. Since the UUID is truly random now, a collision won't happen.

Remove sessionID collision loop check.

Make the newSessionID const. Stacked on top of D3124 to refactor the code.

Now in this diff we have only moved the generation function to the Tools and changed the generator.
Maybe I should change the diff title and description. What do you think @ashoat ?

services/tunnelbroker/docker-server/contents/server/src/Service/TunnelbrokerServiceImpl.cpp
81 ↗(On Diff #9423)

I say skip the database lookup and the while loop. Since the UUID is truly random now, a collision won't happen.

That makes sense. I've removed the DB fetch.

This revision is now accepted and ready to land.Feb 8 2022, 9:42 PM
max marked an inline comment as done.

Rebasing.