How I do offsite backups for mastodon.nl

 · Systeemkabouter

As part of the Mastodon Servers Convenant that we apply for mastodon.nl, we need backups. Of-course we need backups.

Rsync at first

So we have backups. Since June 2023 we've had offsite backups. This means the backup data is stored at another physical location and even with a different hosting solution in this case.

The backups in 2023 were implemented in the easiest/cheapest way: hire a storage box somewhere that allows the rsync protocol and just daily sync everything.

This kinda worked, but was CPU intensive for the storage host and seemed a bit more involved to check if the backups actually ran and were complete.

Evolving to snapshot based

The storage host we use at mastodon.nl is a Debian VM with a large ZFS storage volume attached. We already use this for local snapshots if we do something tricky like cleanup or major upgrades. But it is totally possible to to create a ZFS snapshot and push it to another host also running a ZFS filesystem.

So this is wat we rely on since late december 2023. The storage target that was rented did not use ZFS so was of no use in this new setup. Instead, the ZFS snapshots are pushed to a device in my home lab. As I have a 1 Gb/s fiber link, even restoring should be doable in the case of a major failure of the production system. But a possible scenario could also be shipping the physical device to the datacenter for quicker recovery.

The snapshot mechanism was kept fairly simple with some shell scripting from my hand, intending to look at more polished options at a later stage. But the shell solution has been running fairly stable, so for now I will leave it like it is.

What level of protection is in place?

Right now, snapshots of the media are created 3-4 times a day and pushed offsite. This is very good. On the other hand, the database only makes a full dump once a day, that is also pushed offsite. I intend to improve this in the future by running the database on top of ZFS too, to allow snapshot based offsite backups.

Cost

The externally hired storage box costed about EUR 13 EUR per month. I think it is reasonable to charge mastodon.nl the same amount for the backups to my home lab as long as I provide the backup target.

Retention

I intend to keep 2 weeks of snapshots offsite. Given the volatile nature of social media, having only two weeks of backup history (which still includes the full site history within the backup) seemed totally reasonable

Shell script

To automate the creation and pushing of snapshots, I created a small shell script, it is provided here for your perusal/inspiration/re-use

#!/bin/bash

function report()
{
  /usr/bin/logger "$1"
  /usr/bin/echo "$1"
}

LAST_SNAPSHOT=`/usr/sbin/zfs list -t snapshot matoke | /usr/bin/tail -1 | /usr/bin/awk '{ print $1; }'`
NEXT_SNAPSHOT="matoke@`date +%F-%H-%M`"

report "Creating snapshot ${NEXT_SNAPSHOT} and pushing delta with ${LAST_SNAPSHOT} offsite" 

/usr/sbin/zfs snapshot ${NEXT_SNAPSHOT}
/usr/sbin/zfs send -vi ${LAST_SNAPSHOT} ${NEXT_SNAPSHOT}  | ssh [USER]@[HOST] zfs receive backups/mastodon_nl
if [[ $? > 0 ]];
then
    report "Pushing snapshot failed, bummer"
else
    report "Finished pushing snapshot offsite" 
fi

I did not automate cleanup of snapshots yet, I have a one-liner that I took from https://serverfault.com/questions/340837/how-to-delete-all-but-last-n-zfs-snapshots for now.