Page MenuHomePhabricator

[keyserver] Script to scp backups over from a server
ClosedPublic

Authored by ashoat on Jun 16 2022, 6:00 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Nov 24, 4:36 PM
Unknown Object (File)
Sat, Nov 23, 10:51 PM
Unknown Object (File)
Sat, Nov 23, 10:47 PM
Unknown Object (File)
Sat, Nov 23, 10:45 PM
Unknown Object (File)
Sat, Nov 23, 7:45 PM
Unknown Object (File)
Wed, Nov 13, 7:30 PM
Unknown Object (File)
Fri, Nov 8, 3:03 PM
Unknown Object (File)
Sat, Nov 2, 12:59 PM
Subscribers

Details

Summary

The Docker host can generate backups, but the backups are useless if they're stored in the same place.

This script is intended to be run on a different machine. It will scp the latest backup from a source folder and make sure disk usage doesn't exceed some limit in the destination.

Test Plan
  • I copy-pasted this script to my personal server, where I ran it to scp files over from my keyserver on AWS: bash/scp-backup.sh backups backups
  • I created dummy backups in the destination to fill up disk space
  • I made sure that when disk space was exceeded, the oldest backups got deleted first
  • I made sure that the cleanup never deleted all backups (there was always one left, even if it's bigger than the limit)

Diff Detail

Repository
rCOMM Comm
Branch
ashoat/backups
Lint
No Lint Coverage
Unit
No Test Coverage

Event Timeline

Overall logic looks good and makes sense, just some minor style fixes and should be ready to land!

keyserver/bash/scp-backup.sh
15–16 ↗(On Diff #13547)

Not that it really matters in this particular instance since the command ends right after the variable is set, but you can use a here string instead of the echo + pipe. In my opinion it looks cleaner and if the script did go further, you could run into some issues using echo.
{F77143}

30 ↗(On Diff #13547)

Good practice to double quote to prevent globbing and word splitting. Not going to go through every example, but there are other instances of variables that should be double-quoted on lines 15, 16, and 17.

40 ↗(On Diff #13547)

Another minor thing, but couldn't we combine the grep + awk command into one awk command by passing total as a pattern in awk? Not sure about performance, but I don't think it's any worse and it looks cleaner to me.

Also curious why we need the -c flag on the du command? Seems like (at least in my limited experience) -cs just prints the total number of bytes twice (except with a total describing one line). Is this just a sanity check or can this be removed? If it can be removed, we don't need the /total/ pattern match in the awk command.

44 ↗(On Diff #13547)

Another suggestion: since ls is intended to be human-readable, its output could change in the future. For this example it probably isn't necessary but just in case. Use find instead of ls to better handle non-alphanumeric filenames.

You could maybe try something like this:
find "$SCP_DEST_FOLDER" -maxdepth 1 -name "comm.*.sql.gz" | wc -l

This revision now requires changes to proceed.Jun 17 2022, 10:05 AM

Great review!!

keyserver/bash/scp-backup.sh
15–16 ↗(On Diff #13547)

Looks like the attachment got dropped?

keyserver/bash/scp-backup.sh
15–16 ↗(On Diff #13547)

Huh, that's weird, I see it on my page (even after refreshing) but it isn't showing up if I open this diff in a window where I'm not signed in to Phabricator. Could be related to the "File not attached"/"referenced file" issue @atul was talking about the other day.

Let me know if you see it now, if not, it was just a screenshot of my shell showing that the cut command with the here string produced equivalent output to the echo + pipe:
{F77206}

Looks good to me! Accepting, assuming the Test Plan was retried on the latest version of the script.

This revision is now accepted and ready to land.Jun 18 2022, 12:53 PM
keyserver/bash/scp-backup.sh
40

Aside, but any thoughts on why we need the -c flag in this du command? It prints the same information as du -s but with an extra total line? The reason I am asking is because then the code could be modified to the Suggested Edit, removing the need for the /total/ pattern match.

I mentioned this in an inline comment, but wanted to hear your thoughts on why the -c was left in. Thanks!

Oops, I had a comment queued up but I totally forgot to send it! The point of -c to du is so that we can sum up the results of files that fit the /dev/shm/comm.*.sql.gz pattern. It's not necessary if you just want the size of a directory, but since we want all of the files of a certain pattern in a directory, we need -c so that the total line shows up at the bottom.

keyserver/bash/scp-backup.sh
40 ↗(On Diff #13547)

-c is what makes the total line show up.

Compare these results:

ashoat@server [~]$ sudo du -cs /dev/shm/comm.*.sql.gz
1651628	/dev/shm/comm.2022-06-12-16:00.sql.gz
1664880	/dev/shm/comm.2022-06-12-20:00.sql.gz
1664880	/dev/shm/comm.2022-06-13-00:00.sql.gz
1667488	/dev/shm/comm.2022-06-13-04:00.sql.gz
1667512	/dev/shm/comm.2022-06-13-08:00.sql.gz
1669956	/dev/shm/comm.2022-06-13-12:00.sql.gz
1670820	/dev/shm/comm.2022-06-13-16:00.sql.gz
1670832	/dev/shm/comm.2022-06-13-20:00.sql.gz
1670844	/dev/shm/comm.2022-06-13-23:00.sql.gz
14998840	total
ashoat@server [~]$ sudo du -s /dev/shm/comm.*.sql.gz
1651628	/dev/shm/comm.2022-06-12-16:00.sql.gz
1664880	/dev/shm/comm.2022-06-12-20:00.sql.gz
1664880	/dev/shm/comm.2022-06-13-00:00.sql.gz
1667488	/dev/shm/comm.2022-06-13-04:00.sql.gz
1667512	/dev/shm/comm.2022-06-13-08:00.sql.gz
1669956	/dev/shm/comm.2022-06-13-12:00.sql.gz
1670820	/dev/shm/comm.2022-06-13-16:00.sql.gz
1670832	/dev/shm/comm.2022-06-13-20:00.sql.gz
1670844	/dev/shm/comm.2022-06-13-23:00.sql.gz