Command Line Options
path to the configuration file
test mode; report what would be done, but don't actually do anything
force backup even if it has already been done today (use with caution; this may overwrite the current diff and thereby lose data)
print debug messages
First off you'll need some general configuration statements:
a directory where tar can store its diff files (necessary for incremental backups).
to store MySQL dumps or other content that can't be streamed to S3 without risking timeouts. Only matters if UseTempFile? is set.
the destination bucket on S3.
the address of the public key to use for encryption. The gpg keyring of the user you're running the script as (i.e., root, for a systemwide cron job) must contain a matching key. If this option is not specified, no encryption is performed.
path to the keyring file containing the public key for the GpgRecipient. Defaults to ~/.gnupg/pubring.gpg
the file containing your AWS authentication keys.
the size of the chunks to be stored on S3, in bytes.
Then you can specify as many directories, databases, and repositories as you like to be backed up. These may be contained in <Cycle> blocks, for the sake of reusing timing configuration, or may be blocks themselves with individual timings.
<name>: a unique identifier for the cycle. This is not used except to establish the uniqueness of each block.
Use "SimpleCycle" (default) to make backups at regular intervals (e.g., every day) with no decaying behavior. Use "HanoiCycle" for a decaying schedule, i.e. to store a few old backups and a lot of recent ones (thanks to Scott Squires for implementing HanoiCycle).
Frequency (when using either cycle type)
how often a backup should be made at all, in days.
Phase (when using SimpleCycle only)
Allows adjusting the day on which the backup is made, with respect to the frequency. Can take values 0 <= Phase < Frequency; defaults to 0. This can be useful, for instance, if you want to alternate daily backups between two backup sets. This can be accomplished by creating two nearly identical backup specifications, both with Frequency 2, but where one has a Phase of 0 and the other has a Phase of 1.
Diffs (when using SimpleCycle only)
tells how many incremental backups to make between full backups. E.g., if you want daily diffs and weekly fulls, set this to 6.
Fulls (when using SimpleCycle only)
tells how many total cycles to keep. This should be at least 2. With only one slot, you'd have no protection while a backup is running, since the old contents of the slot are deleted before the new contents are written.
Discs (when using HanoiCycle only)
The number of full backups to keep on a decaying schedule (e.g., setting this to 4 should provide backups that are one day, two days, four days, and eight days old, more or less depending on the current day relative to the Hanoi rotation). Managing incremental backups on a decaying cycle would be very messy, so all backups using the Hanoi cycle are full backups, not diffs.
ArchiveOldestDisc (when using HanoiCycle only) (default false)
If all the slots specified by the Discs parameter are in use and the new backup would overwrite the oldest slot, keep an archive copy of the oldest backup. For example, with 5 discs, this would result in having recent backups of 1, 2, 4, and 8 days ago, and archived backups every 16 days (16, 32, 48, etc. days ago). Using an analogy to backup tapes, this is like removing the tape with your oldest backup after each full cycle, putting it into storage, and adding a fresh tape into the rotation.
This causes the total volume of backup data to grow indefinitely. Depending on your needs, it may make sense to use a fairly large number of discs, so as to keep a few very old backups while rarely triggering the archival condition.
Directory <name> or <Directory name>
<name> a directory to be backed up. May appear as a property within a cycle block, or as a block in its own right, e.g. <Directory /some/path>. The latter case is just a shorthand for a cycle block containing a single Directory property.
MySQL <databasename> or <MySQL databasename>
In order for this to work, the user you're running the script as must be able to mysqldump the requested databases without entering a password. This can be accomplished through the use of a .my.cnf file in the user's home directory. <databasename> names a single database to be backed up, or "all" to dump all databases. The Diffs property is ignored, since MySQL dumps are always "full".
PostgreSQL <databasename> or <PostgreSQL databasename>
In order for this to work, the user you're running the script as must be able to pg_dump the requested databases without entering a password. This can be accomplished through the use of a .pgpass file in the user's home directory. <databasename> names a single database to be backed up, or "all" to dump all databases. The Diffs property is ignored, since PostgreSQL dumps are always "full".
Subversion <repository> or <Subversion repository>
In order for this to work, the user you're running the script as must have permission to svnadmin dump the requested repository. <repository> names a single svn repository to be backed up. Incremental backups are handled by storing the latest backed-up revision number in a file under DiffDir. As elsewhere, setting Diffs to 0 (or just leaving it out) results in a full dump every time. (Thanks to Kevin Ross for adding the incremental behavior here).
SubversionDir <repository-dir> or <SubversionDir repository-dir>
<repository-dir> a directory containing multiple subversion repositories, all of which should be backed up. (this feature was inspired by http://www.hlynes.com/2006/10/01/backups-part-2-subversion)
Causes the data to be backed up to be dumped to a local file before being streamed to S3. Set to 0 or 1. This is most useful in a MySQL block, because the slow upload speed to S3 can cause mysqldump to time out when dumping large tables. Letting mysqldump write to a temp file before uploading it obviously avoids this problem. An alternate solution is to set long mysqld timeouts in my.cnf:net_read_timeout=3600 net_write_timeout=3600
That may be the right solution for some circumstances, e.g. if the databases are larger than the available scratch disk. The UseTempFile configuration will work for regular filesystem backups and Subversion backups as well, at the cost of (temporary) disk space and more disk activity.