Paths

Table of Contentst

-
services/
-
aws_backup/
-
dynamo_db_tables/
-
backup-service-backup
-
backup-service-backup-test
-
backup-service-log
-
backup-service-log-test
-
blob-service-blob
-
blob-service-blob-test
-
blob-service-reverse-index
-
blob-service-reverse-index-test
-
tunnelbroker-device-session
-
tunnelbroker-message
-
tunnelbroker-messages
-
tunnelbroker-public-key
-
tunnelbroker-verification-message
-
s3.dump
-
package.json
-
scripts/
-
backup_aws.sh
-
list_services.sh

[services] Dev Mode - create backups for AWS S3 and dynamoDB
AbandonedPublic
Actions

Authored by • karol on Mar 23 2022, 7:56 AM.

Details

Reviewers

• jim
tomek
• max
varun

Summary

I think it would be good to keep track of the dumps from dynamo DB tables and the S3 buckets.
Every time someone changes a table, they should run the backup script, so the values get updated in our repository.
We can then easily recreate the S3 bucket structure and all the tables (without data) from the dynamo DB.
This can be useful for testing as well as for the tasks like setting up a local cloud (this is done in the upcoming diffs).
In general, I think having a backup of the structure of these functionalities like this is beneficial, what do you think?

Test Plan

cd services
yarn backup-aws

Diff Detail

Repository

rCOMM Comm

Lint

No Lint Coverage

Unit

No Test Coverage

Event Timeline

• karol created this revision.Mar 23 2022, 7:56 AM

Herald added subscribers: • benschac, atul, • adrian and 2 others. · View Herald TranscriptMar 23 2022, 7:56 AM

Harbormaster completed remote builds in B7536: Diff 10605.Mar 23 2022, 8:01 AM

• karol requested review of this revision.Mar 23 2022, 8:01 AM

• karol edited the summary of this revision. (Show Details)Mar 23 2022, 8:03 AM

• karol edited the test plan for this revision. (Show Details)

• karol added reviewers: • jim, tomek, • max, varun.

• karol added a child revision: D3496: [services] Dev Mode - setup local cloud.Mar 23 2022, 8:07 AM

TODO : It would be good to mark all the generated files as @generated so the phabricator "ignores" them. But first, I'd like to know your opinion about the idea in the first place.

First, there should probably be separate diffs for S3 and DynamoDB as these are very different use cases in my opinion.

Starting with dynamodb, let me make sure I understand -- database schema is implicitly defined through requests to insert items and in the source code through the database entity classes, right? How are indexes defined? In the description, you say "Every time someone changes a table, they should run the backup script". What constitutes changing a table? How will I know if I change a table?

Similarly, where is S3 bucket structure defined?

I don't really understand the justification either. Is this just to help developers see the structure? Shouldn't they be looking at the Database entity classes for this? Or is this actually going to be used in scripts as implied by "This can be useful for testing as well as for the tasks like setting up a local cloud (this is done in the upcoming diffs)."

Overall, I don't really like this -- there should be a canonical representation of the database and filesystem (S3) schema in the source code, which will make updating the schema more visible and less error-prone.

This revision now requires changes to proceed.Apr 4 2022, 12:46 PM

• jim mentioned this in D3496: [services] Dev Mode - setup local cloud.Apr 4 2022, 12:49 PM

First, there should probably be separate diffs for S3 and DynamoDB as these are very different use cases in my opinion.

Yes, I know, it even contains "and" in the title which indicates it should be split. I didn't write it precisely, but I wanted to get your opinion about the idea in general before spending too much time on this as I knew it may be invalid in the first place.

Starting with dynamodb, let me make sure I understand -- database schema is implicitly defined through requests to insert items and in the source code through the database entity classes, right?

Right. I mean, these things are more like helpers for developers. There is no real schema, you could out of nowhere start assigning values to some totally new fields, because it's NoSQL, right? It's all more like a structure we agreed on and I think we should use tools like entities to keep that structure and avoid errors.

How are indexes defined?

You can spot we use some indexes in our code - check DatabaseManager.cpp of different services. Other than that, indexes are not persisted anywhere but on the cloud and that's the problem I'm trying to solve.

In the description, you say "Every time someone changes a table, they should run the backup script". What constitutes changing a table? How will I know if I change a table?

What I meant was changing a table on the cloud, so to do this, you'd log in to AWS, go to dynamo console and change anything about the table - table name, partition key name, sort key name, etc. The point is to keep track of this in our code so if we somehow lose the DB one day, we'll be able to recover the structure at least and we'll be able to apply the structure to another instance of the cloud (like the local cloud).

Similarly, where is S3 bucket structure defined?

We use S3 bucket names in our code (for now it's just commapp-blob as we decided we'll only access the S3 from the blob service, but there are other buckets for different purposes). Besides of that, it's not defined anywhere and that's what I'm trying to change. Similarily like for dynamo, every time someone modifies the S3 structure, they should run the backup script to update the S3 structure in our code.

I don't really understand the justification either. Is this just to help developers see the structure? Shouldn't they be looking at the Database entity classes for this? Or is this actually going to be used in scripts as implied by "This can be useful for testing as well as for the tasks like setting up a local cloud (this is done in the upcoming diffs)."

Please, read the description:

We can then easily recreate the S3 bucket structure and all the tables (without data) from the dynamo DB.
This can be useful for testing as well as for the tasks like setting up a local cloud (this is done in the upcoming diffs).

So, as I said above, every time we need to recreate the structure for some reason, this will be useful.

Overall, I don't really like this -- there should be a canonical representation of the database and filesystem (S3) schema in the source code, which will make updating the schema more visible and less error-prone.

Sorry, I don't understand, what do you mean by "canonical"? What alternatives do you see for this? Please, remember that the goal here is to keep track of what's on the cloud somehow.

@karol-bisztyga, I think @jimpo is suggesting having the production database schema defined in code, and the schema setup being automatically handled based on that code using an infrastructure-as-code platform such as Terraform. That is in contrast with your approach here, where you are instead having the schema setup handled by the individual developer, and then backed up as code. @jimpo correct me if I'm wrong?

Yes, in this diff I didn't sense it but I assumed we could use terraform in https://phabricator.ashoat.com/D3496#100509. If we both agree on that, I think we can do this.
https://linear.app/comm/issue/ENG-989/use-terraform-to-set-up-the-cloud

I changed my approach, now I used terraform for this. I'm abandoning this one. I decided that it will be faster to set up a new stack as I reordered the diffs and a lot of the differed from the ones from this stack. Let's follow up in the stack beginning at D3695.

Revision Contents
Changeset List

Path

Size

services/

aws_backup/

dynamo_db_tables/

backup-service-backup

68 lines

backup-service-backup-test

49 lines

backup-service-log

24 lines

backup-service-log-test

24 lines

blob-service-blob

16 lines

blob-service-blob-test

16 lines

blob-service-reverse-index

34 lines

blob-service-reverse-index-test

34 lines

tunnelbroker-device-session

16 lines

tunnelbroker-message

24 lines

tunnelbroker-messages

24 lines

tunnelbroker-public-key

16 lines

tunnelbroker-verification-message

16 lines

s3.dump

5 lines

package.json

3 lines

scripts/

backup_aws.sh

41 lines

list_services.sh

2 lines

Diff 10605

View Options

services/aws_backup/dynamo_db_tables/backup-service-backup

This file was added.

				{
				"TableName": "backup-service-backup",
				"KeySchema": [
				{
				"AttributeName": "userID",
				"KeyType": "HASH"
				},
				{
				"AttributeName": "backupID",
				"KeyType": "RANGE"
				}
				],
				"AttributeDefinitions": [
				{
				"AttributeName": "backupID",
				"AttributeType": "S"
				},
				{
				"AttributeName": "created",
				"AttributeType": "S"
				},
				{
				"AttributeName": "userID",
				"AttributeType": "S"
				}
				],
				"GlobalSecondaryIndexes": [
				{
				"IndexName": "userID-created-index-2",
				"KeySchema": [
				{
				"AttributeName": "userID",
				"KeyType": "HASH"
				},
				{
				"AttributeName": "created",
				"KeyType": "RANGE"
				}
				],
				"Projection": {
				"ProjectionType": "INCLUDE",
				"NonKeyAttributes": [
				"recoveryData"
				]
				}
				},
				{
				"IndexName": "userID-created-index",
				"KeySchema": [
				{
				"AttributeName": "userID",
				"KeyType": "HASH"
				},
				{
				"AttributeName": "created",
				"KeyType": "RANGE"
				}
				],
				"Projection": {
				"ProjectionType": "INCLUDE",
				"NonKeyAttributes": [
				"reverseIndex"
				]
				}
				}
				],
				"BillingMode": "PAY_PER_REQUEST"
				}

View Options

services/aws_backup/dynamo_db_tables/backup-service-backup-test

This file was added.

				{
				"TableName": "backup-service-backup-test",
				"KeySchema": [
				{
				"AttributeName": "userID",
				"KeyType": "HASH"
				},
				{
				"AttributeName": "backupID",
				"KeyType": "RANGE"
				}
				],
				"AttributeDefinitions": [
				{
				"AttributeName": "backupID",
				"AttributeType": "S"
				},
				{
				"AttributeName": "created",
				"AttributeType": "S"
				},
				{
				"AttributeName": "userID",
				"AttributeType": "S"
				}
				],
				"GlobalSecondaryIndexes": [
				{
				"IndexName": "userID-created-index",
				"KeySchema": [
				{
				"AttributeName": "userID",
				"KeyType": "HASH"
				},
				{
				"AttributeName": "created",
				"KeyType": "RANGE"
				}
				],
				"Projection": {
				"ProjectionType": "INCLUDE",
				"NonKeyAttributes": [
				"recoveryData"
				]
				}
				}
				],
				"BillingMode": "PAY_PER_REQUEST"
				}

View Options

services/aws_backup/dynamo_db_tables/backup-service-log

This file was added.

				{
				"TableName": "backup-service-log",
				"KeySchema": [
				{
				"AttributeName": "backupID",
				"KeyType": "HASH"
				},
				{
				"AttributeName": "logID",
				"KeyType": "RANGE"
				}
				],
				"AttributeDefinitions": [
				{
				"AttributeName": "backupID",
				"AttributeType": "S"
				},
				{
				"AttributeName": "logID",
				"AttributeType": "S"
				}
				],
				"BillingMode": "PAY_PER_REQUEST"
				}

View Options

services/aws_backup/dynamo_db_tables/backup-service-log-test

This file was added.

				{
				"TableName": "backup-service-log-test",
				"KeySchema": [
				{
				"AttributeName": "backupID",
				"KeyType": "HASH"
				},
				{
				"AttributeName": "logID",
				"KeyType": "RANGE"
				}
				],
				"AttributeDefinitions": [
				{
				"AttributeName": "backupID",
				"AttributeType": "S"
				},
				{
				"AttributeName": "logID",
				"AttributeType": "S"
				}
				],
				"BillingMode": "PAY_PER_REQUEST"
				}