Page MenuHomePhabricator

[services] Backup - dev mode
AbandonedPublic

Authored by karol on Feb 15 2022, 5:18 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Dec 20, 5:34 PM
Unknown Object (File)
Fri, Dec 20, 5:34 PM
Unknown Object (File)
Fri, Dec 20, 5:34 PM
Unknown Object (File)
Fri, Dec 20, 5:34 PM
Unknown Object (File)
Fri, Dec 20, 5:34 PM
Unknown Object (File)
Fri, Dec 20, 5:31 PM
Unknown Object (File)
Thu, Dec 12, 9:03 AM
Unknown Object (File)
Wed, Nov 27, 6:03 AM

Details

Summary

Dev mode for the database operations in backup

Depends on D3268

Test Plan
cd services
yarn test-backup-service-dev-mode
yarn run-backup-service-dev-mode

Diff Detail

Repository
rCOMM Comm
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
karol edited the summary of this revision. (Show Details)
karol edited the test plan for this revision. (Show Details)
karol added reviewers: tomek, max, varun.
tomek requested changes to this revision.Feb 16 2022, 5:45 AM
tomek added inline comments.
services/backup/docker-server/contents/server/src/DatabaseManager.dev.cpp
34–43 ↗(On Diff #9664)

This isn't too effective. We're calling dbSimulator.backup.find(userID) thrice... is it because it stores unique ptrs? Maybe we should use shared instead?

services/backup/docker-server/contents/server/test/DatabaseManagerTest.cpp
31–35 ↗(On Diff #9664)

That makes sense. I think that we could add a test where all the times are the same and we expect the failure

This revision now requires changes to proceed.Feb 16 2022, 5:45 AM
karol added inline comments.
services/backup/docker-server/contents/server/src/DatabaseManager.dev.cpp
34–43 ↗(On Diff #9664)

I know, yes this is because there's a unique pointer there. We have to use it to be able to modify it because of the way that the folly's concurrent hash map works. I've not successfully hacked this collection to be able to reuse the find result, unfortunately. that's the best I can pull off from this.

services/backup/docker-server/contents/server/test/DatabaseManagerTest.cpp
31–35 ↗(On Diff #9664)

Yes, we can add such a test.

tomek added inline comments.
services/backup/docker-server/contents/server/src/DatabaseManager.dev.cpp
34–43 ↗(On Diff #9664)

Ok, it makes some sense that folly doesn't allow saving a reference to an item. Have you checked if it allows saving an iterator?

This revision is now accepted and ready to land.Feb 21 2022, 5:26 AM
This revision now requires review to proceed.Feb 21 2022, 5:26 AM
ashoat requested changes to this revision.Feb 22 2022, 12:30 AM
ashoat added a subscriber: jim.

Requesting changes to get an answer to a question. I recall seeing some discussion on Comm about dev mode, and @jimpo suggested a different approach than the one we had been taking previously. I don't recall the details there... can you share that context in this diff, and clarify if you're using the "old" approach or the new one suggested by @jimpo? (@jimpo is out this week, so might be able to clarify this for you unfortunately)

This revision now requires changes to proceed.Feb 22 2022, 12:30 AM

This is the old approach, the new one uses some kind of local instance of the cloud. I'm not sure but I might've put this up before we talked about this in the channel. I think we can land this and refactor the dev mode separately https://linear.app/comm/issue/ENG-795/use-local-cloud-instance-for-dev-mode-in-services

ashoat requested changes to this revision.Feb 22 2022, 12:53 AM
ashoat added inline comments.
services/backup/docker-server/contents/server/src/DatabaseManager.dev.cpp
34–43 ↗(On Diff #9664)

Based on the suggestion from my friend here, it seems like what @palys-swm is suggesting may be possible.

@karol-bisztyga, can you look into this? I know dev mode is not a big deal, and the code quality does not matter quite as much here. But I think it would be good for you to master ConcurrentHashMap and MPMCQueue, and I think this learning will help you in many other places too.

This revision now requires changes to proceed.Feb 22 2022, 12:53 AM

This is the old approach, the new one uses some kind of local instance of the cloud. I'm not sure but I might've put this up before we talked about this in the channel. I think we can land this and refactor the dev mode separately https://linear.app/comm/issue/ENG-795/use-local-cloud-instance-for-dev-mode-in-services

Thanks for clarifying and creating a Linear task to track. It makes sense to land this as-is and follow-up later. However, one concern: you didn't subscribe anybody to the task you created. As discussed last Monday, please always make sure people are subscribed when creating a task.

Thanks for clarifying and creating a Linear task to track. It makes sense to land this as-is and follow-up later. However, one concern: you didn't subscribe anybody to the task you created. As discussed last Monday, please always make sure people are subscribed when creating a task.

Oops, sorry, my bad, missed that.

services/backup/docker-server/contents/server/src/DatabaseManager.dev.cpp
34–43 ↗(On Diff #9664)

You are right, OK, I'm going to try to hack this. If I'm not gonna make it in a decent amount of time, I'm going to document what I have tried.

karol edited the summary of this revision. (Show Details)

fix multiple finds

tomek requested changes to this revision.Feb 22 2022, 7:16 AM

It's great that you found a way to avoid repeated finds! Just one question inline, as after this refactoring it's more noticeable that this code might crash.

services/backup/docker-server/contents/server/src/DatabaseManager.dev.cpp
42 ↗(On Diff #9818)

What happens when it == this->dbSimulator.backup.begin()?

This revision now requires changes to proceed.Feb 22 2022, 7:16 AM
karol added inline comments.
services/backup/docker-server/contents/server/src/DatabaseManager.dev.cpp
42 ↗(On Diff #9818)

It works ok. I'm not sure why this could be some special case to be handled. The end() here refers to the underlying collection that is in it->second(it's a std::map) and we check above if it's empty and return a nullptr if so.

tomek requested changes to this revision.Feb 23 2022, 3:04 AM
tomek added inline comments.
services/backup/docker-server/contents/server/src/DatabaseManager.dev.cpp
42 ↗(On Diff #9818)

Ahhh, ok, that makes sense. I incorrectly assumed that --it->second->end() means (--it)->second->end() while instead it is --(it->second->end()) - sorry for the confusion.

46–51 ↗(On Diff #9818)

Here we can also use only one find and store an iterator

65–76 ↗(On Diff #9818)

I know that this is only dev code, but this might be really slow, if we always need to iterate through all the logs. What do you think about storing these in a map similar to the backup's one? So instead of having a list of logs, we would have a map from backup id to a list of logs.
I think that it might be important because even on dev there could be a lot of logs.

services/backup/docker-server/contents/server/test/DatabaseManagerTest.cpp
31–35 ↗(On Diff #9818)

This worries me a bit. So if we have a collision in the real service, does the client need to retry? I know that the connection introduces the delay, but if we have multiple threads running this code, the delay doesn't protect us against the collisions.

This revision now requires changes to proceed.Feb 23 2022, 3:04 AM
services/backup/docker-server/contents/server/test/DatabaseManagerTest.cpp
31–35 ↗(On Diff #9818)

This problem only appears in the dev mode because of the way we use the collections. and it only will occur in tests because in the service run locally we will have a delay that's going to come from the grpc connection.

if we have multiple threads running this code, the delay doesn't protect us against the collisions

I'm assuming one user will only call from one place, therefore there will be no collisions that will come from multiple threads as the collection is: userID -> map(created -> item)

tomek added inline comments.
services/backup/docker-server/contents/server/test/DatabaseManagerTest.cpp
31–35 ↗(On Diff #9818)

So it sounds like a collision is possible when one user would call this from two different devices. Ideally, we could include a kind of device id here, which would make this safe, but this scenario is so unlikely, we probably can leave it as it is.

ashoat requested changes to this revision.Feb 27 2022, 9:05 PM

Just a question inline

services/backup/docker-server/contents/server/dev/DatabaseSimulator.h
20–28

Is there a good reason we are using thread-safe collections at the top level (ConcurrentHashMap), but with non-thread-safe collections nested inside (std::map, std::vector)? It seems like it defeats the whole purpose...

services/backup/docker-server/contents/server/test/DatabaseManagerTest.cpp
34

@karol-bisztyga, I've noticed a pattern you have where you consistently forget to put a space before the beginning of a parenthetical. Going forward, can you please always include a space before an open paren when writing English?

This revision now requires changes to proceed.Feb 27 2022, 9:05 PM
In D3200#86984, @karol-bisztyga wrote:

This is the old approach, the new one uses some kind of local instance of the cloud. I'm not sure but I might've put this up before we talked about this in the channel. I think we can land this and refactor the dev mode separately https://linear.app/comm/issue/ENG-795/use-local-cloud-instance-for-dev-mode-in-services

Do we expect this code to get removed then after ENG-795 is implemented? Fine to land for now, just want to be clear.

Do we expect this code to get removed then after ENG-795 is implemented? Fine to land for now, just want to be clear.

Yes.

I don't think we should pay a lot of attention to this since it's going to be removed anyway pretty soon.

services/backup/docker-server/contents/server/dev/DatabaseSimulator.h
20–28

I don't think there's a difference between when the value in the thread-safe collection is just a variable or another collection. Either way, the value is locked while it's used, right? So it won't be accessed by another thread. I may be wrong about this.

services/backup/docker-server/contents/server/test/DatabaseManagerTest.cpp
34

Hmm, OK, I will try to keep that in mind. Interesting that Grammarly doesn't perceive this as an error.

ADDRESS BEFORE LANDING

I don't think we should pay a lot of attention to this since it's going to be removed anyway pretty soon.

Okay fair enough, but I still feel we should understand ConcurrentHashMap better. Before landing, can you address the inline question?

services/backup/docker-server/contents/server/dev/DatabaseSimulator.h
20–28

Hmm, can you research this more? You may be right, but the docs for ConcurrentHashMap say:

Readers are always wait-free.
Writers are sharded, but take a lock.

Does that imply reads don't take a lock?

I was trying to figure out where DatabaseSimulator is used, but I can't find it in any dependent diff. Can you point out where it's used?

This revision is now accepted and ready to land.Mar 7 2022, 9:53 PM
ashoat requested changes to this revision.Mar 18 2022, 1:36 PM
ashoat added a reviewer: jim.

Over chat, @karol-bisztyga asked:

should we land or abandon https://phabricator.ashoat.com/D3200? For now, I don't have time to investigate folly's collections and I'd like to proceed with the stack. There's a comment from @ashoat "address before landing".

Sharing my perspective here so it doesn't get lost.

I don't think we should land this diff as-is. I certainly don't think we should abandon it. I think we should do one of the following:

  1. Make sure the collection is threadsafe by using threadsafe collections throughout. Instead of std::map use folly::ConcurrentHashMap. Instead of std::vector use folly::MPMCQueue
  2. Make sure we don't actually need the collection to be threadsafe, and then remove the concurrent collections entirely. So instead of folly::ConcurrentHashMap, we could just use std::map
  3. Research Folly more to figure out if there is a valid way to use a threadsafe collection where the value in the collection is not itself threadsafe (I suspect there is not, but I am far from an expert)

In short: if you need the collection to be threadsafe, then we need to be more confident that it is actually threadsafe. I don't think it currently is. On the other hand, if we don't need the collection to be threadsafe, then there is no reason to be using threadsafe primitives.

Would love @jimpo's perspective on this, he understands thread safety better than I do.

This revision now requires changes to proceed.Mar 18 2022, 1:36 PM

I think it's much better to abandon this and use a local dynamodb service in the docker-compose like https://hub.docker.com/r/amazon/dynamodb-local or https://github.com/localstack/localstack. It's a much simpler and more maintainable solution.

@karol-bisztyga, curious what you think of @jimpo's suggestion. I definitely think that would be preferable, and don't want to waste your time on this now if we're just going to get rid of it soon. The big question for me is how easy it would be to get a local service running... if it just takes a day or so, that's clearly preferable. (PS this is tracked in ENG-795)

This was my point all this time, please see https://phabricator.ashoat.com/D3200#86984, the only difference is instead of landing we can abandon. I also mentioned once again here https://phabricator.ashoat.com/D3200#89895 that we'll remove it soon.

Cool, I've bumped ENG-795 to the current cycle.

This is still the big question for me:

The big question for me is how easy it would be to get a local service running... if it just takes a day or so, that's clearly preferable

Based on our discussion and Ashoat's final decision(quote from the comm app):

I'm okay to abandon it. I would probably prefer that we first investigate the cost of the local service work – would be great if we could scope it to 1-3 days. and then after confirming we are going to pursue the local service, then I think it makes sense to abandon

I'm abandoning this one.