A couple of weeks ago, Ashley showed us how to automatically create simple backups of your WordPress site on your server. In this article I’m going to expand on that, showing how we can automatically remove old backups and copy backups to Amazon Glacier for safer storage.
Removing Old Backups
First, let’s add a line to Ashley’s script to remove old backups.
rm -f $(date +%Y%m%d* --date='1 month ago')
I’m using a date command to get the date 1 month ago and removing any backup files created 1 month ago. For example, if the script is running on July 24th, it will remove any backups created on June 24th. So as long as your script runs every day, it will cleanup backups from a month ago.
It’s worth noting that the date command is different on OS X, so you will need to use the following instead.
rm -f $(date -v-1m +%Y%m%d*)
Since we’re going to be transferring backup files and paying for storage, it makes sense to compress our backups.
Doing that is as simple as running the
gzip command. Here’s what our script looks like now:
As you can see, I’ve rejigged things a bit. I’ve added hours, minutes, and seconds to the date string so we can run it more than once per day and it will not overwrite the previous backup files. I’ve also removed the
cd, so now you can run this script from any WordPress install directory. As a result, the line you add to your crontab is now a little different…
0 5 * * 0 cd /srv/www/bradt.ca/public_html; /home/bradt/scripts/backup.sh
WP-CLI Not Required
A little note about WP-CLI. We’re using it in our scripts because it reads all the database settings from the
wp-config.php so we don’t have to include them in our script. However, if you don’t have WP-CLI installed on your server and don’t want to install it, you can certainly replace the
wp db export command with a
DB_NAME=sample_db DB_USERNAME=sample_user DB_PASSWORD=sample_pw mysqldump -u $DB_USERNAME --password=$DB_PASSWORD -h localhost $DB_NAME > $SQL_FILE 2>&1
Now onward, to sending our backups to the cloud…
Why Remote Backups?
No matter how much redundancy your web host claims to have, there’s still a possibility that a failure could result in data loss. And so it’s a good idea to do your own backups in addition to whatever your web host is doing and store them somewhere other than your server.
Why Amazon Glacier over S3?
Like Amazon S3, Amazon Glacier is a storage service but is designed for data archiving and backup. Whereas you can retrieve your data from S3 almost instantly, data retrieval from Glacier can take several hours. The big advantage of Glacier is cost. It costs only $0.01 per GB per month, whereas S3 costs two to three times that.
I thought we could use Glacier only, but after digging into it, it turns out that Glacier doesn’t have a UI for listing and retrieving files. Uploading files to Glacier is far from straightforward as well. Thankfully Amazon has a way to copy S3 files to Glacier automatically and use the S3 UI for listing and restoring them.
So we’ll upload the backup files to S3, then let Amazon handle the Glacier bits, including removing old files.
Setting Up an S3 Bucket
First, let’s create a new S3 bucket to hold our backups.
Once the bucket is created, expand the Lifecycle panel and click Add rule. In step 1, just leave it set to Whole Bucket and proceed by clicking the Configure Rule button.
In step 2, select Archive then Permanently Delete from the select box. Then add zero to the Archive to the Glacier Storage Class textbox so that uploads to your bucket are moved to Glacier as soon as possible. In the second textbox, enter the number of days after upload to remove the backup files. I’ve decided to keep my backups for a year.
Now click the Review button and then the Save Rule button.
Great, now any files uploaded to your bucket will be moved to Glacier within a day but you can still manage the files using S3. You get all the benefits of S3 with the cost of Glacier. Neat.
Setting Up an AWS User
Now that we have a bucket, we need a user with permissions to upload to it. For details on how to do this, see our documentation or checkout the screencast below. Be sure to hang onto your Access Keys as you will need them below.
Installing AWS CLI
Amazon offers an official set of command line tools for working with all its services including Glacier. At the moment the AWS CLI requires Python 2.6.5+ and pip (Python’s package manager).
I had to install pip on my server, and it was pretty easy…
$ wget https://bootstrap.pypa.io/get-pip.py $ python get-pip.py
Then to install AWS CLI it was just another command…
$ pip install awscli
Uploading to S3
To upload to S3, we first need to give the AWS CLI the Access Keys of the user we created earlier…
$ aws configure AWS Access Key ID [None]: ************** AWS Secret Access Key [None]: *************************** Default region name [None]: us-east-1 Default output format [None]: text
Now it’s very simple to upload a file to our S3 bucket…
aws s3 cp $SQL_FILE.gz s3://backups-bradt-ca/ --storage-class REDUCED_REDUNDANCY
I’m setting the Reduced Redundancy storage class here as it is a bit less expensive than Standard (for the few hours before it gets switched to Glacier) and as these are backups we don’t need the 99.999999999% durability of Standard. 99.99% durability is plenty.
And that’s it. Just a matter of updating our script to run this command.
As you can see, I’ve added a couple of checks at the top of the script to make sure you’re running it from a WordPress installation folder and to make sure your backups folder exists.
To run the script and copy the files to S3, simply run it with the bucket name, like this:
$ ./backup.sh backups-bradt-ca
If you don’t want to copy files to S3, simply omit the bucket name:
So there you have it, a fairly simple setup to backup your WordPress site and store it remotely.
You may also want to consider using our WP Offload S3 plugin to copy files to S3 as they are uploaded to the Media Library. Then not only are the files stored on Amazon’s 99.999999999% durable cloud rather than your server, but you can also serve them to your site visitors from S3 or through CloudFront.
This alone isn’t a great backup solution. For example, it doesn’t allow you to restore an accidentally deleted file. However, you can enable versioning on your S3 bucket to fill that gap and configure lifecycle rules to move older versions of a file to Glacier and keep costs down. Sounds like something worth diving into in another article.