Page MenuHomePhabricator

[keyserver] Script to scp backups over from a server
ClosedPublic

Authored by ashoat on Jun 16 2022, 6:00 PM.
Tags
None
Referenced Files
F3356642: D4288.diff
Sat, Nov 23, 7:45 PM
Unknown Object (File)
Wed, Nov 13, 7:30 PM
Unknown Object (File)
Fri, Nov 8, 3:03 PM
Unknown Object (File)
Sat, Nov 2, 12:59 PM
Unknown Object (File)
Oct 24 2024, 8:15 AM
Unknown Object (File)
Oct 24 2024, 8:13 AM
Unknown Object (File)
Oct 7 2024, 2:32 PM
Unknown Object (File)
Oct 7 2024, 2:32 PM
Subscribers

Details

Summary

The Docker host can generate backups, but the backups are useless if they're stored in the same place.

This script is intended to be run on a different machine. It will scp the latest backup from a source folder and make sure disk usage doesn't exceed some limit in the destination.

Test Plan
  • I copy-pasted this script to my personal server, where I ran it to scp files over from my keyserver on AWS: bash/scp-backup.sh backups backups
  • I created dummy backups in the destination to fill up disk space
  • I made sure that when disk space was exceeded, the oldest backups got deleted first
  • I made sure that the cleanup never deleted all backups (there was always one left, even if it's bigger than the limit)

Diff Detail

Repository
rCOMM Comm
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Overall logic looks good and makes sense, just some minor style fixes and should be ready to land!

keyserver/bash/scp-backup.sh
15–16 ↗(On Diff #13547)

Not that it really matters in this particular instance since the command ends right after the variable is set, but you can use a here string instead of the echo + pipe. In my opinion it looks cleaner and if the script did go further, you could run into some issues using echo.
{F77143}

30 ↗(On Diff #13547)

Good practice to double quote to prevent globbing and word splitting. Not going to go through every example, but there are other instances of variables that should be double-quoted on lines 15, 16, and 17.

40 ↗(On Diff #13547)

Another minor thing, but couldn't we combine the grep + awk command into one awk command by passing total as a pattern in awk? Not sure about performance, but I don't think it's any worse and it looks cleaner to me.

Also curious why we need the -c flag on the du command? Seems like (at least in my limited experience) -cs just prints the total number of bytes twice (except with a total describing one line). Is this just a sanity check or can this be removed? If it can be removed, we don't need the /total/ pattern match in the awk command.

44 ↗(On Diff #13547)

Another suggestion: since ls is intended to be human-readable, its output could change in the future. For this example it probably isn't necessary but just in case. Use find instead of ls to better handle non-alphanumeric filenames.

You could maybe try something like this:
find "$SCP_DEST_FOLDER" -maxdepth 1 -name "comm.*.sql.gz" | wc -l

This revision now requires changes to proceed.Jun 17 2022, 10:05 AM

Great review!!

keyserver/bash/scp-backup.sh
15–16 ↗(On Diff #13547)

Looks like the attachment got dropped?

keyserver/bash/scp-backup.sh
15–16 ↗(On Diff #13547)

Huh, that's weird, I see it on my page (even after refreshing) but it isn't showing up if I open this diff in a window where I'm not signed in to Phabricator. Could be related to the "File not attached"/"referenced file" issue @atul was talking about the other day.

Let me know if you see it now, if not, it was just a screenshot of my shell showing that the cut command with the here string produced equivalent output to the echo + pipe:
{F77206}

Looks good to me! Accepting, assuming the Test Plan was retried on the latest version of the script.

This revision is now accepted and ready to land.Jun 18 2022, 12:53 PM
keyserver/bash/scp-backup.sh
40 ↗(On Diff #13558)

Aside, but any thoughts on why we need the -c flag in this du command? It prints the same information as du -s but with an extra total line? The reason I am asking is because then the code could be modified to the Suggested Edit, removing the need for the /total/ pattern match.

I mentioned this in an inline comment, but wanted to hear your thoughts on why the -c was left in. Thanks!

Oops, I had a comment queued up but I totally forgot to send it! The point of -c to du is so that we can sum up the results of files that fit the /dev/shm/comm.*.sql.gz pattern. It's not necessary if you just want the size of a directory, but since we want all of the files of a certain pattern in a directory, we need -c so that the total line shows up at the bottom.

keyserver/bash/scp-backup.sh
40 ↗(On Diff #13547)

-c is what makes the total line show up.

Compare these results:

ashoat@server [~]$ sudo du -cs /dev/shm/comm.*.sql.gz
1651628	/dev/shm/comm.2022-06-12-16:00.sql.gz
1664880	/dev/shm/comm.2022-06-12-20:00.sql.gz
1664880	/dev/shm/comm.2022-06-13-00:00.sql.gz
1667488	/dev/shm/comm.2022-06-13-04:00.sql.gz
1667512	/dev/shm/comm.2022-06-13-08:00.sql.gz
1669956	/dev/shm/comm.2022-06-13-12:00.sql.gz
1670820	/dev/shm/comm.2022-06-13-16:00.sql.gz
1670832	/dev/shm/comm.2022-06-13-20:00.sql.gz
1670844	/dev/shm/comm.2022-06-13-23:00.sql.gz
14998840	total
ashoat@server [~]$ sudo du -s /dev/shm/comm.*.sql.gz
1651628	/dev/shm/comm.2022-06-12-16:00.sql.gz
1664880	/dev/shm/comm.2022-06-12-20:00.sql.gz
1664880	/dev/shm/comm.2022-06-13-00:00.sql.gz
1667488	/dev/shm/comm.2022-06-13-04:00.sql.gz
1667512	/dev/shm/comm.2022-06-13-08:00.sql.gz
1669956	/dev/shm/comm.2022-06-13-12:00.sql.gz
1670820	/dev/shm/comm.2022-06-13-16:00.sql.gz
1670832	/dev/shm/comm.2022-06-13-20:00.sql.gz
1670844	/dev/shm/comm.2022-06-13-23:00.sql.gz