Every once in a while I see posts on mailing lists where people wonder about doing remote backups. I figured it may be worth while to describe how I have been doing my home work station backups for the last few years. Hopefully, this will be useful to someone.
I consider a backup system that requires frequent manual attention pretty much useless. Mainly this is because it is hard to maintain proper backup discipline when busy or away. This is especially true of media based backups. Swapping tapes or CDs to make a backup is annoying enough that the backup probably won’t get done as often as it should. Networks allow the backup process to be automated by having each system back itself up regularly and automatically to another host on the network. However, making backups to another host in the same building doesn’t help much when the building burns down. If you have computer equipment at two secure locations with a large pipe between them, automatic off-site backups are pretty easy. Unfortunately, most individuals are not this lucky. However, with the proliferation of broadband it is quite possible that you know someone who has a big enough pipe that you could send backup data to them in off hours.
This remote computer may be owned by your best friend but do you really want to make your backup data available to them? Even if you do trust this person maybe they don’t look after their machine as well as you do, their computer could be cracked etc. Clearly the remote system needs to be considered untrusted. The data is going to have to be encrypted.
My backup script basically does:
- Create a tarball of all of the data to backup.
- bzip2 the file to make the transfer shorter.
- Run GNUPG to encrypt the file to myself.
- Use SCP to transfer the file to the remote system.
Thus, this requires that you have an OpenPGP key (via GNUPG) and access via SSH (SCP) to the remote host. Transferring the file with some other, less secure method shouldn’t reduce the security of the system too much. The only problem would be if someone sniffed your authentication information and then deleted the files from the remote host. Since the files are encrypted downloading them doesn’t do the bad guy any good.
This system is not suited to backing up your media library. Mostly because of bandwidth limitations but also because incremental backups are not possible. The entire backup is sent every time.
Though the point of this entry was to just put the idea of doing backups this way in out there for Google to index I have made a copy of my backup.sh available. The script is quite simple but should provide a good starting point for anyone interested in taking the implementation further. This particular script is setup to do daily and weekly backups. It has two configuration options that specify plain text files containing lists of directories to exclude from the daily and weekly backups (see man tar for the exclude file format). What I do is exclude everything but frequently changing directories from the daily backup and only exclude media directories from the weekly.
There is one obvious catch-22 with this system. Your GNUPG keys are stored in ~/,gnupg and this directory is backed up and encrypted by these keys. If your computer is lost the only copy of your data you have left is encrypted. You now have no way to decrypt your backup. So, you need to keep a separate backup copy of your GNUPG keys somewhere else. Since you have a pass-phrase on your key (you had better anyway) these files are already encrypted.
In order to make this backup system automatic (and hence useful) it needs to be able to transfer the backup file without user intervention. With SCP this can be accomplished by creating a un-passworded SSH key-pair. This allows the host which holds the keys to login to the remote host without a password, ie without user intervention. Then the SSH_OPTIONS variable in the script can be modified to point SCP to this key. Now you can setup the script as a cron job and get your backups done automatically every night. MD5 sums are used to verify the successful transfer of the backup. The script will also email you if the backup failed.
This script could be made a bit smarter so it would delete old backups from the remote host. It does not do that right now. You’ll have to login to the remote host once in a while to delete old backups. How often you will need to do this depends on how much space the remote host has available.