Addendum to "Time Machine for every Unix out there"

My article about using rsync to mimic the behavior of Apple’s Time Machine generated a lot of traffic, and more important, a lot of feedback.

In this article I’ll summarize and try to clarify a few things.

First of all: I didn’t invent any of the things I’ve written in the article! All I did was to write it up in a manner I could understand it. The rsync developer(s) included the feature(s) for a specific purpose which might be exactly what we do now. But this was obvious, I hope.

Time Machine and Performance

Ars Technica has a very deep review of the new features of Leopard, including Time Machine.

Interestingly they do very much the same as I did in my script – each Backup gets its own folder and there is a link to the latest.

Some readers complained that Time Machine looks only at the changed files, where my script has to look at each file on the disk. For me, the difference is negligible. Apple introduced hard-links to directories, which saves a lot of time and a bit of space. Unfortunately this feature isn’t availabel on plain Unix, but it’s a neat trick!

Restoring the Files

Since we make full backups each time, you have your whole and complete directory structure in that Backup-directories.

Restoring a file is merely a simple copy command.

Or, if you want to restore the whole filesystem, use:

rsync -avz --delete host:/Backup/scurrent/ /target/

But keep in mind that —delete removes any file which is not in the backup!

Removing old Backups

Another huge advantage of full backups is: you can delete any backup you don’t need anymore.

Of course each backup-directory takes space, if only for the hard-links and directory structure. Cleaning up the backups is a breeze: just delete the directories you don’t need anymore. For example you could delete all directories older than 2 months or keep one week full, then only the monday backup. There are plenty of possibilities.

Other Tools

Many people suggested many interesting tools for doing backups. My argument against using such tools is that rsync is everywhere!

You find rsync in the smallest Linux distributions (like DSL), on PDAs like the Zaurus, on Windows using Cygwin and also on Nokia’s Internet Tablets.

One backup strategy for all your devices!

Space Requirements for the Links

Nothing in life is free, but the space used by the hardlinks for the backup is marginal. It is also dependent on the filesystem you use, ReiserFS seems to do a good job in packing more links in one cluster. But hey, these things are Backups you throw away space anyways!

Colon

There is one particular problem using a colon in the script: some operating systems don’t like them.

Here an updated version of the script without colons:


 #!/bin/sh

 date=`date "+%Y-%m-%dT%H_%M_%S"`
 rsync -azPE --link-dest=PATHTOBACKUP/current $SOURCE $HOST:PATHTOBACKUP/back-$date \
   && ssh $HOST "rm PATHTOBACKUP/current \
   && ln -s back-$date PATHTOBACKUP/current"

Another slight change in the script is, that all commands are connected with &&. This means the next command is only done when the previous command returned no error.

Of course, if anything messes up the “current” link, the script just makes another full backup without the links to the previous version.

Extended Attributes on OS X

One user suggested the addition of the -E parameter on Mac OS X. If you use the parameter on non-OS X machines it’s more or less redundant (-a contains executability), but on OS X some additional information is backed up.

Just use it.

Windows

Some users reported success with this strategy on Windows. I’m happy that we now have a unified backup strategy for all major operating systems without paying a single cent.

You have to use Cygwin to install rsync (there are binaries without the whole Cygwin on the net, here for example).

Measuring the Space Requirements

To lookup how much space your particular backups need, simply issue du -shc back-*. This prints each backup directory with its real memory usage:

  me@myserver:Backups % du -shc back-*                                                                                                               
  406M  back-2007-11-12T12:00:00
  6.8M  back-2007-11-13T20:28:56
  189M  back-2007-11-13T23:42:54
  2.6G  back-2007-11-14T08:15:43
  1.8M  back-2007-11-15T20:07:07
  7.9M  back-2007-11-18T23:57:09
  11M back-2007-11-19T21:53:34
  2.6M  back-2007-11-21T18:57:40
  2.1M  back-2007-11-21T19:11:02
  11M back-2007-11-22T01:19:38
  1.9M  back-2007-11-22T11:05:17
  19M back-2007-11-22T18:37:56
  7.4M  back-2007-11-22T22:45:04
  8.5M  back-2007-11-22T22:51:29
  9.2M  back-2007-11-24T12:45:12
  4.2M  back-2007-11-24T17:30:38
  11M back-2007-11-25T00:41:19
  1.9M  back-2007-11-27T20:41:40
  317M  back-2007-12-03T10:19:07
  12M back-2007-12-05T11:51:19
  3.6G  total
  me@myserver:Backups % 

As you can see, the smallest backup (obviously without any changed file) requires 1.8MB.

Backups are typically that last line of defense against data loss, and consequently the least granular and the least convenient to use.
— Yosemite Technologies