Monday, July 23, 2007

Backups, pt 1

We all need to do backups of our computers. Most of us don't. Let's take a brief look at the typical backup options available to us.

1. Manual backup to removable media, such as tape, CD, DVD, etc. This has been the traditional way of doing backups. The rationale (at least, until recently) is that removable media can store more data than hard drives. For example, about ten years ago, a high-end tape cartridge could store about 8GB of data wherease the largest hard drive you could reasonably obtain was about 4GB. Also, backups were supposed to be write-mostly, meaning that backups were to be read very infrequently. The linear nature of tape meant that reads were very slow, but that was a small price to pay for the relative low cost of the cartridges.

Nowadays, almost all removable media hold less than a hard drive, so this option isn't good for comprehensive backup solutions. Still, it's the preferred method among casual backer-uppers for keeping a copy of documents, photos, and so forth.

2. Automated backup to another hard drive. Given how ridiculously cheap hard drives are these days, this is my current method of doing backups. The backup drive is simply a large, cheap hard drive that keeps a copy of the data on my computer.

3. Automated backup to a network service. I like the concept of these backup services (Mozy, .Mac, Amazon S3). The storage is relatively cheap and you can be assured that the service itself is keeping backups, so you'll never lose your data. All your files are kept off-site, a big plus if your home or office is lost in a fire or flood.

For the automated solutions, there's a problem: the backups are performed at periodic intervals, such as every night or once a week. But the time you really need your backup is immediately after you realize you just deleted an important file. Ideally, the backup system should be making backups within minutes of making any filesystem change, whether it be adding, deleting, or modifying a file.

One easy fix that fits within the realm of automated, periodic backups is simply to reduce the backup interval. How about every five minutes? That would be swell. But that begets another problem: the actual time it takes to perform a backup may well exceed the backup interval! You see, typical backup applications figure out which files have changed by scanning all of your files and comparing the modifiation time to the last time the backup was made. My desktop computer has nearly a million files in it; scanning all of them takes a long, long time. It causes my computer to slow to a crawl, too, as system resources are consumed.

I hate doing backups. I dread seeing the pop-up window that appears, letting me know that backups are about to begin. My system will become unusable for about an hour. Ugh.

How can we speed it up and make it better? We need help from the operating system itself, something that most OSs don't natively provide to applications. Apple has a solution in its upcoming Leopard release. We'll talk about it in an upcoming post.