Data Backup
osx workflow

After this week’s discussion on Build and Analyze about data backup, I thought I would take the time to explain my backup strategy. Like Dan and Marco, my professional life revolves around data collection and computation. My personal life is highly digitalized. Losing any of this data would be catastrophic. I recently decided to update my backup plan to encompass my ever increasing need for better backup functionality. With my system, I strive for automation and simplicity to keep the backup process as frictionless and error-proof as possible.

Before explaining my system, here are the backup software and hardware tools I use:

  • Daisy Disk: Invaluable for forming a picture of your system and for determining what directories are sucking up space.
  • SuperDuper!: The Swiss Army knife of backup—extremely useful for a variety of backup tasks.
  • Time Machine: Apple's easy to use backup tool.
  • rsync and cron: The special sauce for file syncing and automation goodness.
  • A diff tool: Handy for comparing the differences in files. I usually reach for vimdiff or splice—your mileage may vary. Xcode's FileMerge is also a good choice.
  • Cloud backup software: More on this below, but I use CrashPlan. Amazon S3 paired with Arq is a great option if you have a small amount of data to backup or are willing to spend a bit more.
  • Local storage: Given the price of hard drive bays and enclosures, it's probably smarter to buy preconfigured hard drives bundled with proprietary enclosures for local backups. I've always had success with LaCie Hard Drives for their reliability, aesthetics, and Mac-friendliness. However, Drobo has always piqued my interest despite the obscene price.

I see any sound backup plan minimally composed of three storage types:

  1. Bootable backup
  2. Local backup
  3. Remote backup

My Backup System

My backup system uses a four pronged approach to prevent data loss. The general system looks like this:

                       ___ cloud_backup___
                     /                     \
             servers                        |
            old_archives                    |                  . Time Machine
                         \                  |                 .
           active_work. . networked_hd ___ Mac ___ external_hd
                         /                  .                 \ 
           music_library                    .                   photo_library
                             Mac clone
  • Bootable Backup
    Having a bootable backup is probably the most superfluous piece of my system, but it's a wonderful luxury to have. I've lost two internal hard drives in the past ten years (12” G4 PowerBook circa 2004 and 15” MacBook circa 2007 and losing my machine's hard drive really compromises my productivity. Replacing these drives means that my computer will be inoperable for days to weeks while waiting for repair. This can be really problematic, especially if I need special software or system requirements/customizations critical for some task.
    A bootable backup clone ensures that when my hard drive dies, I will have absolutely no downtime. When my computer's internal hard drive goes, I can attach the bootable drive and boot a perfect clone of my internal HD, then work from the bootable drive as if nothing ever happened. What's even better is that I can actually work off the bootable drive attached to another computer while my machine is being repaired.1
    My bootable solution utilizes the venerable SuperDuper! together with LaCie’s Starck Portable External HD. The awesome thing about the Stark is that it doesn't come with the typical AC/DC power adaptor and requisite power and USB cords. Therefore, it's very portable and an ideal drive to carry while commuting.
  • External Hard Drive
    The least interesting piece of my setup is an external HD, which is split into two partitions. One partition houses my day-to-day Time Machine backups and the other partition houses my referenced Aperture photo library. I use a wired connection for these two applications to maximize speed (here’s looking at you Aperture).
  • Networked Hard Drive
    My current networked drive is a 2Tb LaCie d2 Quadra Hard Disk.2 This HD contains three partitions. The archive volume stores old archived files and directories that I don't typically need to access. I try to reorganize my archive volume prior to each OS upgrade. The second volume contains my music library, which I like to store on the network drive so that my music can be accessible throughout the house, garage, and yard. The third volume houses active work projects. I keep this volume in sync using a Python script I wrote that wraps rsync to manages server access via ssh key authentication. Working from this drive is useful if a server is down or if I want to work from home.
  • Cloud Backup
    I think everyone should probably have some form of remote backup. If a natural disaster such as a fire, hurricane, flood, or earthquake destroyed all my local data, I would still feel safe knowing my data was protected at an off-site data storage facility. I tried quite a few cloud back services and read numerous reviews—here, here, and here. Ultimately, I chose CrashPlan for their competitive pricing, unlimited data storage option, and the ability to backup servers without a graphical environment using a headless client. Overall, I 'been extremely happen with their product, but I wish they would provide a native Mac application rather than the hideous java-based app currently available. I currently use CrashPlan's headless client to backup several servers. The setup is simple thanks to CrashPlan's documentation and this gem. I also backup everything illustrated above in dashed lines to CrashPlan (servers, old_archives, music_library, Mac, and photo_library). The items illustrated with dots is the only data not sent to CrashPlan (active_work, bootable_hd, and Time Machine). Crashplan can also can send you a seeded backup of your data by mail should I ever lose all local data and need to get up and running rapidly. Given CrashPlan 's unlimited data backup model, it will be interesting to see if they can accommodate the growing size of customer backups.

  1. This feature is dependent on specific hardware architecture. I've successfully booted my MacBook Pro clone on a MacBook and another MacBook Pro, but not from a PowerPC G4 PowerBook. ↩

  2. This HD is to small for my current needs. I am eagerly awaiting an upgrade to a Thunderbolt-equipped HD. My understanding is that the Thunderbolt drives should accommodate daisy-chaining several drives together to my network. This should accommodate future storage expansion. ↩