Using Amazon Glacier with the AWS CLI for Media Archival

If you’re like me, you might have many gigabytes of media files that you’ve accumulated over the years: video, audio, high-resolution photos and the like. In the past, I’ve used everything from stacks of floppy disks to Iomega Ditto (tape) and Zip drives to CDs and DVDs, and I still have a few large-capacity hard drives about.

Each of these solutions worked OK at the time, but today where cheap cloud storage is running rampant across the digisphere, there are new alternatives to these old methods, such as Amazon Glacier.

What is Glacier?

Amazon Glacier is a storage service that works with Amazon S3 to archive files at very low cost (currently, a penny ($0.01) per Gigabyte. Yes—that means that you can currently store 100 Gigabytes of data for one dollar a month). Since its stored on the cloud in Amazon’s secure facilities, you don’t need to worry about fire destroying your data, or losing the media on which it’s stored, or bit-rot, or many other concerns that you must deal with if you’re storing media yourself.

The catch is this—there is an additional transfer cost if you want to transfer large amounts of data out of Glacier, and moving data from Glacier to S3 takes some time (3-5 hours, according to the FAQ). However, for data archival, you are not expecting to access it very often (only if your trusty local backup dies, for example). This is exactly the thing that Glacier was designed for.

Meet the AWS CLI

I’ve tried, but I just can’t kick the command-line habit. For me, working on the command-line is natural, easy and powerful. Unlike the capriciousness of Graphical User Interface (GUI) environments, the AWS CLI can be used similarly on all platforms that it supports, and is a great way to transfer files to and from Amazon Glacier. If you intend to follow my example and use the AWS CLI, you should first download and install it on your system. Instructions to do so are here:

Now that we have that out of the way, I’ll assume that everything went well and that you now have the AWS CLI installed…

Setting up a Glacier Archival Bucket in S3

The easiest way to archive items to Glacier is through S3. Basically, you set up an S3 bucket that has a lifecycle rule that moves your files to Glacier after a predetermined time (such as one day).

Create an archival bucket

To begin, create a bucket that you’ll use for your data. You can have many such buckets, if you like. For this example, I’ll store some Free Lossless Audio Codec (FLAC) files, which can take up a bit of space… So, I’ll create a bucket with the AWS CLI to store them in:

aws s3 mb s3://my-flac-audio

Set a bucket lifecycle rule

Next, I’m going to set a lifecycle rule for the bucket that will transfer items to Glacier after a day. You could do this with the AWS Management Console if you want, but I’m going to do this using the command-line. Here’s the bucket lifecycle rule, in JavaScript Object Notation (JSON):

{
  "Rules": [
    {
      "ID": "Rule for the Entire Bucket",
      "Status": "Enabled",
      "Prefix": null,
      "Transition": {
        "Days": 1,
        "StorageClass": "GLACIER"
      }
    }
  ]
}

Rather than trying to enter this information on the command-line, it’s a good idea to save it in a file, which you can also validate for correctness before passing it to an AWS CLI command. So I save the above JSON block in a file called glacier-rule.json.

Next, we’ll apply the rule to the bucket we just created using the following command:

aws s3api put-bucket-lifecycle --bucket my-flac-audio --lifecycle-configuration file://glacier-rule.json

Now, the bucket my-flac-audio will automatically transfer any new files placed in it to Glacier after one day.

Archiving files

To use the bucket to archive files, all I need to do is transfer files from my local system into the bucket. There are a number of ways to do this, but I like to use the aws s3 sync command, like so:

aws s3 sync --delete . s3://my-flac-audio

The aws s3 sync command will automatically avoid transferring files that you’ve already uploaded, and by using the --delete switch, you can also make sure that any files that no longer exist in your local version of the archive (for example, if you renamed a file) will be deleted in S3, as well.

By setting up a Glacier archival bucket like this, you can run the aws s3 sync command any time you want to update your archive in the cloud. Easy!

For More Information

This article just scratches the surface of what you can do with the AWS CLI, Amazon Glacier and Amazon S3. For more information, have a look at the official AWS documentation: