Backup Your WordPress Site to Amazon Glacier

A couple of weeks ago, Ashley showed us how to automatically create simple backups of your WordPress site on your server. In this article I’m going to expand on that, showing how we can automatically remove old backups and copy backups to Amazon Glacier for safer storage.

Removing Old Backups

First, let’s add a line to Ashley’s script to remove old backups.

rm -f $(date +%Y%m%d* --date='1 month ago')

I’m using a date command to get the date 1 month ago and removing any backup files created 1 month ago. For example, if the script is running on July 24th, it will remove any backups created on June 24th. So as long as your script runs every day, it will cleanup backups from a month ago.

It’s worth noting that the date command is different on OS X, so you will need to use the following instead.

rm -f $(date -v-1m +%Y%m%d*)

Compressing Backups

Since we’re going to be transferring backup files and paying for storage, it makes sense to compress our backups.

Doing that is as simple as running the gzip command. Here’s what our script looks like now:

As you can see, I’ve rejigged things a bit. I’ve added hours, minutes, and seconds to the date string so we can run it more than once per day and it will not overwrite the previous backup files. I’ve also removed the cd, so now you can run this script from any WordPress install directory. As a result, the line you add to your crontab is now a little different…

0 5 * * 0 cd /srv/www/bradt.ca/public_html; /home/bradt/scripts/backup.sh

WP-CLI Not Required

A little note about WP-CLI. We’re using it in our scripts because it reads all the database settings from the wp-config.php so we don’t have to include them in our script. However, if you don’t have WP-CLI installed on your server and don’t want to install it, you can certainly replace the wp db export command with a mysqldump command:

DB_NAME=sample_db
DB_USERNAME=sample_user
DB_PASSWORD=sample_pw

mysqldump -u $DB_USERNAME --password=$DB_PASSWORD -h localhost $DB_NAME > $SQL_FILE 2>&1

Now onward, to sending our backups to the cloud…

Why Remote Backups?

No matter how much redundancy your web host claims to have, there’s still a possibility that a failure could result in data loss. And so it’s a good idea to do your own backups in addition to whatever your web host is doing and store them somewhere other than your server.

Why Amazon Glacier over S3?

Like Amazon S3, Amazon Glacier is a storage service but is designed for data archiving and backup. Whereas you can retrieve your data from S3 almost instantly, data retrieval from Glacier can take several hours. The big advantage of Glacier is cost. It costs only $0.01 per GB per month, whereas S3 costs two to three times that.

I thought we could use Glacier only, but after digging into it, it turns out that Glacier doesn’t have a UI for listing and retrieving files. Uploading files to Glacier is far from straightforward as well. Thankfully Amazon has a way to copy S3 files to Glacier automatically and use the S3 UI for listing and restoring them.

So we’ll upload the backup files to S3, then let Amazon handle the Glacier bits, including removing old files.

Setting Up an S3 Bucket

First, let’s create a new S3 bucket to hold our backups.

Creating an S3 Bucket

Once the bucket is created, expand the Lifecycle panel and click Add rule. In step 1, just leave it set to Whole Bucket and proceed by clicking the Configure Rule button.

In step 2, select Archive then Permanently Delete from the select box. Then add zero to the Archive to the Glacier Storage Class textbox so that uploads to your bucket are moved to Glacier as soon as possible. In the second textbox, enter the number of days after upload to remove the backup files. I’ve decided to keep my backups for a year.

S3 Lifecycle Settings

Now click the Review button and then the Save Rule button.

Great, now any files uploaded to your bucket will be moved to Glacier within a day but you can still manage the files using S3. You get all the benefits of S3 with the cost of Glacier. Neat.

Setting Up an AWS User

Now that we have a bucket, we need a user with permissions to upload to it. For details on how to do this, see our documentation or checkout the screencast below. Be sure to hang onto your Access Keys as you will need them below.

Installing AWS CLI

Amazon offers an official set of command line tools for working with all its services including Glacier. At the moment the AWS CLI requires Python 2.6.5+ and pip (Python’s package manager).

I had to install pip on my server, and it was pretty easy…

$ wget https://bootstrap.pypa.io/get-pip.py
$ python get-pip.py

Then to install AWS CLI it was just another command…

$ pip install awscli  

Uploading to S3

To upload to S3, we first need to give the AWS CLI the Access Keys of the user we created earlier…

$ aws configure
AWS Access Key ID [None]: **************
AWS Secret Access Key [None]: ***************************
Default region name [None]: us-east-1
Default output format [None]: text

Now it’s very simple to upload a file to our S3 bucket…

aws s3 cp $SQL_FILE.gz s3://backups-bradt-ca/ --storage-class REDUCED_REDUNDANCY

I’m setting the Reduced Redundancy storage class here as it is a bit less expensive than Standard (for the few hours before it gets switched to Glacier) and as these are backups we don’t need the 99.999999999% durability of Standard. 99.99% durability is plenty.

And that’s it. Just a matter of updating our script to run this command.

As you can see, I’ve added a couple of checks at the top of the script to make sure you’re running it from a WordPress installation folder and to make sure your backups folder exists.

To run the script and copy the files to S3, simply run it with the bucket name, like this:

$ ./backup.sh backups-bradt-ca

If you don’t want to copy files to S3, simply omit the bucket name:

$ ./backup.sh

So there you have it, a fairly simple setup to backup your WordPress site and store it remotely.

You may also want to consider using our WP Offload S3 plugin to copy files to S3 as they are uploaded to the Media Library. Then not only are the files stored on Amazon’s 99.999999999% durable cloud rather than your server, but you can also serve them to your site visitors from S3 or through CloudFront.

This alone isn’t a great backup solution. For example, it doesn’t allow you to restore an accidentally deleted file. However, you can enable versioning on your S3 bucket to fill that gap and configure lifecycle rules to move older versions of a file to Glacier and keep costs down. Sounds like something worth diving into in another article.

About the Author

Brad Touesnard

As founder of Delicious Brains Inc., Brad wears many hats; from coding and design, to marketing and partnerships. Before starting Delicious Brains, Brad was a busy freelance web developer, specializing in front-end development.

  • Thanks for this article, I was searching for an alternative to my FTP backup, never thought about using Glacier.

    Just a remark about the deletion of old backups : the backup from the 31st of a month will not be deleted one month later, except for july and december.
    You can use find instead : “find / -mtime +30 | xargs /bin/rm -rf” or “find / -mtime +30 –exec rm -rf {} ;” will delete all the files and directories older than 30 days under .

    This will also work if the script is not run every day.

  • Nice article. Bookmarked for future reference. I wrote a similar blog AWS S3 WordPress Integeration at http://hackpundit.com http://bit.ly/1KfwFeJ. Would love to try out above as we grow. Thanks

  • monoloops

    Thanks for great tutorial I’ve learned a lot from you guys. I have small problem with DB backups. I’ve setup cron job for every 5 min to do backup just for testing purpose but no matter what am using (WP CLI or mysqldump) my database isn’t backed up. Just upload folder. I have checked million times for TYPO or any other mistake but can’t figure out 🙁

    • Have you tried running the wp db export command without the --quiet flag?

      wp db export ../backups/$SQL_FILE –add-drop-table –url=http://blah.com

  • pressaholic webdev

    Thanks for this excellent article !

    Is there a way to use ashley’s backup source code and upload to a Dropbox account ?

    Cheers!

  • Andy Eblin

    Replying a bit late to this. Seem to be able to get things working fine if I fire the backup myself, but CRON doesn’t seem to want to run things. Could you provide a crontab setting that includes the s3 work toward the end?

    • rickysynnot

      Same… Have you had any luck @andyeblin:disqus ?

  • rickysynnot

    This is awesome @@bradt66:disqus, but no matter wether I run the cron event as root or a user ‘ricky’, the files never hit the bucket. When I run it manually as @andyeblin:disqus did, both the DB and uploads folders are created. When Cron runs it, the DB file is not created and the uploads one is.

    Thoughts? How should I approach this? Use a different crontab? Any help would be greatly appreciated!