Page MenuHomePhabricator

[tunnelbroker] Implement messages TTL after device deletion
AcceptedPublic

Authored by bartek on Jul 29 2024, 6:08 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Sep 8, 12:41 PM
Unknown Object (File)
Sun, Sep 1, 7:34 AM
Unknown Object (File)
Thu, Aug 29, 8:16 PM
Unknown Object (File)
Wed, Aug 28, 10:27 PM
Unknown Object (File)
Wed, Aug 28, 12:58 PM
Unknown Object (File)
Wed, Aug 28, 12:58 PM
Unknown Object (File)
Wed, Aug 28, 12:57 PM
Unknown Object (File)
Mon, Aug 26, 4:15 PM
Subscribers

Details

Reviewers
kamil
Summary

Addresses ENG-8548 and ENG-8495.
Added a TTL for undelivered messages attribute to DynamoDB and set it during device removal for all device messages.

Test Plan

Tested locally:

  1. Set the TTL to a short value (1 minute)
  2. Sent a few messages to a device that wasn't currently connected (to keep them stored in DDB)
  3. Called the DeleteDeviceData RPC
  4. Used awslocal dynamodb scan to see that the TTL attribute was added
  5. Waited a minute, then ran 4. again. Messages were gone

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

bartek held this revision as a draft.
bartek published this revision for review.Jul 29 2024, 7:13 AM
bartek added inline comments.
services/tunnelbroker/src/constants.rs
22–24

What should the value of this TTL be? This is for how long we store undelivered messages after device logout and account deletion.
cc @ashoat

services/tunnelbroker/src/database/mod.rs
157–163

Not sure if I should do TransactUpdate instead (which is more involved).

  • With batch write, there's a chance that a message gets deleted meanwhile by other requests, but batch write will save it again. It will be deleted anyway when TTL expires, but there's a small chance that a device could receive the same message twice. The chance is small because if the device is being deleted, it's likely to be already logged out.
  • With transact update, the above problem disappears, but there's a chance that a message, after receiving confirmation, won't be deleted by other requests, because of transaction conflict. Also, the transaction might fail during to one of the messages just being removed. In other words, there are many edge cases to handle and I'm wondering if it's worth it.
kamil added inline comments.
services/tunnelbroker/src/constants.rs
22–24

I think it could stay longer, e.g. one week but not sure what is the best number here

services/tunnelbroker/src/database/mod.rs
157–163

I think there is a really low chance for the message to be received again, even if most of our communication is encrypted, that being said this message will fail to decrypt and the client will ignore this anyway.

I think this is fine and an additional mechanism is not worth it.

This revision is now accepted and ready to land.Jul 30 2024, 3:59 AM
services/tunnelbroker/src/constants.rs
22–24

Can you explain the implications of this decision?

We should make sure that if a primary device logs out, the secondary device always "receives the message" in the sense that it knows it is logged out

It would be very broken if a user logs out of their primary device, and then opens their secondary a week later and it appears as if it's still logged in

Perhaps the risk will be mitigated because the device will attempt to connect with identity using its CSAT and find out that it's invalidated? Not sure if this check happens on every app open or not

services/tunnelbroker/src/constants.rs
22–24

Is there a risk that a primary will miss a message about a secondary logging out, and then not update the device list?

services/tunnelbroker/src/constants.rs
22–24

In fact, the time required for the message to be delivered to the device being removed is a few seconds max, assuming the removed device is connected to Tunnelbroker. So to address my concern from this comment, the TTL could be e.g. 1 minute as well.
The only reason I put 24hrs is because initially we were thinking of some nanny task that would delete old messages, and the task could be run daily.
And my question was more about data policy - e.g. for how long we can store users data after they delete their accounts.


If the device is not connected to Tunnelbroker and it's been logged out on Identity, it actually won't have a chance to connect to Tunnelbroker because of revoked CSAT. So it won't receive any further messages.

Perhaps the risk will be mitigated because the device will attempt to connect with identity using its CSAT and find out that it's invalidated? Not sure if this check happens on every app open or not

Yes, this is the mechanism that will log out the devices if they don't receive the message. It should work already (IIRC we have some handlers that log out on missing CSAT).
Moreover, besides app start, any service actions will log out if CSAT is invalid - ENG-8401.


Is there a risk that a primary will miss a message about a secondary logging out, and then not update the device list?

No, this scenario is safe here. Both primary and secondary are still logged in when the message is sent; moreover, the recipient (primary) won't be logged out so its messages won't be deleted. It'll receive the message as soon as it becomes online.

Something we should add to the test plan:

  1. User logs out of primary
  2. We wait some time, enough for Tunnelbroker to delete the messages
  3. We open a secondary device. The secondary device should be logged out

My impression is that this won't work without first addressing ENG-8401, in which case that might need to be considered in-scope here.