Documentation

Developer’s Guide to WP Offload Media

On the surface, the way WP Offload Media works seems simple, it copies WordPress Media Library items to a cloud storage provider such as Amazon S3 and then makes sure WordPress uses those offloaded files in content. However, as the saying goes, “The Devil’s in the details”.

This document aims to cover the basics of how and when WP Offload Media offloads Media Library items to a cloud storage provider such as Amazon S3, DigitalOcean Spaces or Google Cloud Storage. It also endeavours to cover the basic details of how and when WP Offload Media then rewrites media URLs found in the site’s content to use those offloaded files, including how a Content Delivery Network (CDN) such as Amazon CloudFront, Cloudflare or StackPath might be involved.

We also cover when offloaded media is updated, downloaded, or removed from cloud storage.

Each subject is first introduced at a reasonably high level, before diving into some details that we often find useful to understand, particularly when trying to work out why something is not working as expected.

Along with descriptions of how and why WP Offload Media does what it does, some critical hooks are briefly discussed, as well as data that is important to how WP Offload Media works.

The following areas are covered:

Cloud Storage Actions

Arguably the most important function of WP Offload Media is to copy files from a WordPress site’s uploads folder to a cloud storage provider’s bucket.

There are two primary ways that copying files to a bucket is initiated; automatically because something happened related to a Media Library item, or because someone explicitly asked WP Offload Media to copy files to the bucket.

Similarly, WP Offload Media must ensure updates to a Media Library item’s files are reflected in the bucket, including manually removing an item from cloud storage.

There are also situations when a Media Library item’s files need to be copied back from the bucket to the server, such as when WP Offload Media’s Remove Files From Server setting is in use and an image is being edited or optimized. It is also possible to manually request that a Media Library item’s files be downloaded.

Automatic Offload

When someone uploads a new image, audio, video or other type of file to the site, either directly within the “Media” area of the admin dashboard or while editing content, WordPress takes the file and stores it on the site’s server, usually somewhere within the wp-content/uploads/ folder.

WordPress then creates some database records to keep track of this newly uploaded file so that it can be shown in the “Media” admin dashboard area, otherwise known as the “Media Library”.

For some types of Media Library items, such as images, WordPress then starts one or more processes to make sure the new item can be used by other areas of the WordPress site. For example, when an image file is uploaded WordPress creates a few copies of the image scaled and/or cropped to different sizes such as 150×150, 300×300 and 1024×1024. This means a Media Library item may consist of many files, not just the original file that was uploaded.

As soon as WordPress has saved the new file into the uploads folder and has created the Media Library database records, WP Offload Media notices the new Media Library item and starts offloading its files to the configured cloud storage provider.

WP Offload Media determines which storage service (e.g. AWS) it should upload the files to, the bucket name to use, and the path for the new objects from its settings. Settings are either defined in the WordPress site’s wp-config.php file or saved in WP Offload Media’s settings page. Any settings found in the wp-config.php file take precedence over settings saved in the WordPress dashboard.

When WP Offload Media has finished uploading the first file associated with a Media Library item to the bucket, it creates some database records to keep track of:

  • Which storage provider the Media Library item was offloaded to
  • Which bucket was used
  • The path (key) in the bucket that was set for the file (object)

If WordPress is generating a number of files for the Media Library item, then WP Offload Media offloads each new file that is created.

All this automatic offloading happens synchronously. The user sees that the media file has been successfully uploaded only once:

  • WordPress has finished saving the file to the server
  • Generated all expected files associated with the Media Library item
  • Saved its details in the database
  • WP Offload Media has finished offloading the files and saving its own database data

Automatic Offload: Hooks

There are 2 core WordPress filters that are integral to automatic offloads.

wp_unique_filename

When WordPress checks that a new Media Library file has a unique filename, WP Offload Media ensures that the file will not overwrite an existing file in the bucket if offloaded.

WP Offload Media also checks that there aren’t any offloaded and removed from local files whose name would clash if copied back to the server.

If there is any potential clash, WP Offload Media alters the proposed filename in the same way that WordPress versions new uploads that clash with existing files.

wp_update_attachment_metadata

This filter is invoked by WordPress when a Media Library item’s first file has been safely saved to the server. It’s called again anytime there are changes to the files that make up a Media Library item, such as a new thumbnail file being saved.

WP Offload Media uses the wp_update_attachment_metadata filter to recognize that something has changed in the Media Library item’s makeup, and uses this as the trigger to offload any files that haven’t been already.

Because this filter can be called many times by WordPress Core while generating thumbnails for a new image, and WP Offload Media might be set up to remove the local files after offload, there are checks to make sure that WordPress no longer needs a file on disk before it is offloaded.

Automatic Offload: Data

When offloading a Media Library item, the following data is most critical.

posts (table)

The Media Library uses posts table records with a post_type of “attachment” for storing some basic information about an item. While WP Offload Media rarely uses these records directly, it does rely on the post_id and post_mime_type fields.

The post_id is the unique ID used to tie all the following data together for a Media Library item.

The post_mime_type is critical for setting the “Content Type” of media sent to services like Amazon S3 so the file can be properly downloaded by a browser. Storage services like Amazon S3 are also very picky about the file type and mime type matching when uploading to the bucket, so it is critical that the correct type is used.

_wp_attached_file

The postmeta table record with meta_key of _wp_attached_file denotes the relative path inside the wp-content/uploads/ folder where a Media Library item’s “full size” file can be found.

WP Offload Media relies on this value in order to be able to find a Media Library item’s files on the server and offload them.

If the Media Library is being managed with Year/Month subfolders, then the /YYYY/MM/ part of this record’s value will be used by WP Offload Media when formatting the bucket path and using the Year/Month setting. If no /YYYY/MM/ part is evident, then the post_date from the posts record will be used instead.

_wp_attachment_metadata

The postmeta table record with meta_key of _wp_attachment_metadata holds a serialized multi-dimensional array of information about a Media Library item.

Of particular importance to WP Offload Media is the array of sizes that holds information about what additional files have been generated for an image. This array is updated as each new file is created by WordPress, triggering WP Offload Media to offload it.

When a large or rotated image has been uploaded to the Media Library, there will also be an original_image entry that points to the original file name that was uploaded and used to generate the “-scaled” or “-rotated” suffixed full size image, and the thumbnails. The original_image is also offloaded by WP Offload Media.

_wp_attachment_backup_sizes

The related postmeta record with meta_key of _wp_attachment_backup_sizes holds information about the original file names used for an image prior to any edits. The files found in this record are also offloaded.

as3cf_items (table)

WP Offload Media uses this custom database table to store information about where a Media Library item has been offloaded to. On a WordPress multisite installation, there is one as3cf_items table per sub site.

The table keeps track of which Storage Provider the item was offloaded to, along with the bucket name and region of the bucket. For storage providers such as AWS, the region may be blank as a default region exists. For services like DigitalOcean Spaces, a region is always set as it is required and governs the API endpoint that is used.

The as3cf_items table also stores the path (object key) of both the “full size” file and the “original image”, and whether the full size and thumbnail sizes are public or private.

The path and original_path are built from WP Offload Media’s Path, Year/Month and Object Versioning settings, along with the media item’s full size filename. These values are therefore mostly independent of the folder path used by the files on the server.

When Amazon CloudFront is being used as the delivery provider, and the Private Media setting has been enabled, this table also keeps track of the private path prefix that may be added to the path for files that need signed URLs.

tantan_wordpress_s3

The options or sitemeta table record with the key of tantan_wordpress_s3 is where WP Offload Media stores its settings if the AS3CF_SETTINGS or other defines have not been used in the site’s wp-config.php file to override them. The name comes from the original plugin that WP Offload Media was forked from.

For the sake of security we highly recommend that the AS3CF_SETTINGS named constant be used, especially for settings such as the AWS Access Key ID and Secret Access Key.

Manual Offload

WP Offload Media includes tools that allow site administrators to manage offloaded Media Library items. This includes the ability to offload all Media Library items that have yet to be offloaded, and to use the “Copy to Bucket” actions in the “Media” area of the admin dashboard.

Media Actions

Regardless of whether the bulk offloader tool or Media Library actions are used, each processed Media Library item is offloaded in the same way.

The same underlying process is used as in an automatic offload, except all the Media Library data is complete and therefore all thumbnails can be offloaded at the same time.

Manual Offload: Hooks

None.

Manual Offload: Data

The same data is used as for automatic offloads.

Automatic Update

If a Media Library item is edited, optimized or replaced, WP Offload Media needs to ensure that any changed files are offloaded to the bucket.

In most cases the same underlying process is used as in an automatic offload, except all the Media Library data is complete and can be processed in one go. This is usually the case when an image is edited or optimized, and if using the Enable Media Replace plugin to replace an image without changing its filename.

In some scenarios files are no longer needed in cloud storage, and so WP Offload Media deletes them from the bucket. This happens when thumbnails are regenerated with a supported plugin (e.g. Regenerate Thumbnails) that may remove no longer needed thumbnail sizes. It also happens when Enable Media Replace is used to update a Media Library item and change its filename.

Another form of automatic update is when an offloaded Media Library item is added or removed from either a WooCommerce or Easy Digital Downloads product file. When a Media Library item is added as a download of a product, the selected file is given a “private” ACL (permission) in the bucket, or moved into a private prefixed path if using signed CloudFront URLs. When removed from a product file, the object is updated with a “public-read” ACL or moved back into the default public path if no longer used in any product files.

Automatic Update: Hooks

wp_update_attachment_metadata

Just as for automatic offloads, WP Offload Media uses the wp_update_attachment_metadata filter to recognize that something has changed in the Media Library item’s makeup, and uses this as the trigger to offload any files that haven’t been already.

While processing this filter WP Offload Media fires some internal filters that check whether there might be a need to remove some no longer needed files from the bucket.

update_attached_file

If a plugin such as Enable Media Replace updates the filename used for a Media Library item, this filter is fired and WP Offload Media ensures the files using the old filename are removed from the bucket.

Automatic Update: Data

The same data is used as for automatic offloads.

as3cf_items (table)

The as3cf_items table’s is_private field is updated when a Media Library item is set as private or public during WooCommerce or EDD product file operations. If a thumbnail size for an image is being used in a product file, then the “private_sizes” array is updated within the extra_info field’s serialized data.

Manual Update

WP Offload Media has a few manual tools that may result in offloaded files being updated.

Just as mentioned in the manual offload section, the “Copy to Bucket” single and bulk actions found in the “Media” section of the WordPress dashboard can be used to re-copy local files from the server to the bucket. This action is not made available if the full size local file is missing.

There are also Media Library item actions for “Make Private in Bucket” and “Make Public in Bucket” that will update the ACL of a bucket object if ACLs are in use. However, if signed CloudFront URLs have been enabled, then making a Media Library item private will move the affected object from its public path to the private prefixed path that secures the object from direct access. Making an item public while using signed CloudFront URLs results in the associated objects being moved to the public path.

A related way that objects may be updated in the bucket is by enabling, updating, or disabling the Private Media settings for the CloudFront Delivery Provider. When enabled or updated you can optionally allow WP Offload Media to move existing private Media Library items to their private prefixed path in the bucket. When disabled, you can have WP Offload Media move existing objects back to the public path (recommended).

In a similar fashion, if the Path, Year/Month or Object Versioning settings are updated in WP Offload Media, then you can optionally have WP Offload Media move the objects into new paths that match the new settings.

Manual Update: Hooks

None.

Manual Update: Data

The same data is used as for automatic updates.

Automatic Download

Sometimes WordPress Core or a plugin needs direct access to a Media Library file in order to perform some action such as edit it, or create a new file from it. If WP Offload Media’s “Remove Files From Server” setting is in use, then the file may need to be temporarily copied back to the server.

Downloading a file from the bucket to the server is usually needed when using the standard WordPress image editing features to crop, rotate or flip an image. Once the action is complete the results are offloaded and the local files removed again.

Another common reason for WP Offload Media to automatically download an offloaded file that has been removed from the server is when a plugin like Regenerate Thumbnails or EWWW Image Optimizer are about to process the file. Again, once they are finished with the downloaded file and the Media Library item is re-offloaded, the local files are removed.

In general, if the “Remove Files From Server” setting is not in use, there is no need for WP Offload Media to perform an automatic download as the Media Library item’s files already exist on the server. If a large number of files are going to be processed in such a way as to require local files then it is recommended that the “Remove Files From Server” setting not be turned on to reduce API requests to the storage provider and improve the speed of the operation.

Automatic Download: Hooks

get_attached_file

The get_attached_file() function is the primary way that WordPress and most plugins obtain the path to a Media Library item’s full size file. When the function invokes this filter, WP Offload Media checks whether the requested offloaded Media Library item’s file is on the server, and if not, will usually return a direct stream wrapper URL to the file.

For example, if “puppies.jpg” was offloaded to the “hellfishmedia” Amazon S3 bucket in the “us-west-1” region, then the following URL might be returned:

s3uswest1://hellfishmedia.s3-us-west-1.amazonaws.com/wp-content/uploads/2020/11/puppies.jpg

WP Offload Media then ensures that any changes to that file are saved to the object in the bucket.

However, the WordPress image editor and some plugins really do need a local file and are unable to use the remote URL, so WP Offload Media downloads the file instead and returns the normal file path.

If the file already exists on the server then WP Offload Media does nothing special and just lets the local file be used.

wp_get_original_image_path

Sometimes WordPress or a plugin needs the “original image” file (e.g. when regenerating thumbnails), this filter is fired in response. WP Offload Media treats it in the same way as get_attached_file to make sure the original image is accessible if offloaded and removed from the server.

as3cf_get_attached_file_copy_back_to_local

Because WP Offload Media has no way of determining whether a plugin calling the get_attached_file() or wp_get_original_image_path() functions can handle stream wrappers or not, it fires this filter itself to see if there is a need to download a file from the bucket when not on the server.

Plugin authors can implement this filter to let WP Offload Media know they need a local file for the current request for a file.

If this function is implemented and returns true then WP Offload Media will automatically download the requested file.

Automatic Download: Data

The same data is used as for automatic offloads.

Manual Download

WP Offload Media has a few tools that may result in offloaded files being downloaded manually.

Just as there are “Copy to Bucket” single and bulk actions, there are also “Copy to Server from Bucket” actions found in the “Media” section of the WordPress admin dashboard. These actions can be used to copy files from the bucket back to the server. This action is not made available if the full size local file is already on the server.

WP Offload Media also has a “Download all files from bucket to server” tool available in the settings page. This tool inspects all offloaded Media Library items and downloads any files that are missing from the server.

The “Remove all files from bucket” tool also available in WP Offload Media’s settings page, and the “Remove from Bucket” single and bulk actions available in the “Media” section of the WordPress admin dashboard will also download files missing from the server before removing from the bucket.

Manual Download: Hooks

None.

Manual Download: Data

The same data is used as for automatic downloads.

Automatic Remove

If an offloaded Media Library item is permanently deleted, then WP Offload Media will remove its files from the bucket too.

As mentioned in Automatic Updates, tools like Regenerate Thumbnails can be used to force the removal of thumbnail sizes that are no longer needed. When a tool like that runs, WP Offload Media will remove those files from the bucket.

If an edited image is restored to its original condition, then WP Offload Media will remove the no longer needed edited files from the bucket.

Automatic Remove: Hooks

wp_update_attachment_metadata

Just as for automatic offloads, WP Offload Media uses the wp_update_attachment_metadata filter to recognize that something has changed in the Media Library item’s makeup, and uses this as the trigger to offload any files that haven’t been already.

While processing this filter WP Offload Media fires some internal filters that check whether there might be a need to remove some no longer needed files from the bucket.

Automatic Remove: Data

The same data is used as for automatic offloads.

_wp_attachment_backup_sizes

Of special interest is the postmeta record with meta_key of _wp_attachment_backup_sizes that holds information about the original file names used for an image prior to any edits. When an edited image is restored the values in this record are likely to overwrite the sizes in the _wp_attachment_metadata record. If the IMAGE_EDIT_OVERWRITE define has been set to true, then the edited files are about to be removed, and so WP Offload Media ensures their objects are removed too.

If the IMAGE_EDIT_OVERWRITE define is not in play, then WordPress will retain the edited files in the backups data, and so WP Offload Media will not remove their objects either.

as3cf_items (table)

WP Offload Media uses this custom database table to store information about where a Media Library item has been offloaded to. If a Media Library item is permanently deleted then the associated as3cf_items record is removed too.

Manual Remove

WP Offload Media provides a “Remove from Bucket” action that can be used in the “Media” area of the WordPress admin dashboard, and a “Remove all files from bucket” tool in its settings page.

Both of these means of removing objects from the bucket will download any files missing from the server before removing their objects. If a missing file fails to download, WP Offload Media will not remove the object, and will not remove its metadata.

Manual Remove: Hooks

None.

Manual Remove: Data

The same data is used as for automatic removals.

as3cf_items (table)

WP Offload Media uses this custom database table to store information about where a Media Library item has been offloaded to. If a remove from bucket operation is successful then the as3cf_items record is deleted.

However, if WP Offload Media could not download any files missing from the server before trying to remove the objects, the as3cf_items record will remain as there are associated objects in the bucket.

Rewriting URLs

Once Media Library items have been offloaded to cloud storage, it is then up to WP Offload Media to make sure those offloaded files are used on the site.

WP Offload Media tries very hard to be a “good citizen” and maximise compatibility with WordPress and third party themes and plugins.

To accomplish this, WP Offload Media does not update any content stored in the database to use cloud storage URLs. Instead, it dynamically rewrites native local media URLs to cloud storage URLs as they are retrieved from the database.

Turning off the “Rewrite Media URLs” setting or deactivating the plugin will result in WordPress using standard local media URLs again. If WP Offload Media’s “Remove Files From Server” option has been in use this could result in 404 file not found errors.

Rewriting Local URLs to Delivery Provider URLs

Under the hood, WordPress stores any image, audio or video content you add to a post or page as absolute URLs.

This means when you add a 300×300 thumbnail version of “puppies.jpg” to a page, WordPress will by default embed something like the following into the content which will be stored in the database on save.

<img src="https://hellfish.media/wp-content/uploads/2020/11/puppies-300x300.jpg" alt="Cute puppies!" class="wp-image-666" srcset="...">

When content is displayed to a site visitor, WP Offload Media rewrites any media URLs it recognizes as having been offloaded.

<img src="https://cdn.hellfish.media/wp-content/uploads/2020/11/12131415/puppies-300x300.jpg" alt="Cute puppies!" class="wp-image-666" srcset="...">

There are two very important changes to the above example src attribute:

  1. The domain has changed from hellfish.media to cdn.hellfish.media.
  2. The path now includes an extra /12131415/ segment.

These changes to the URL come from two separate settings: the custom domain setting under WP Offload Media’s Delivery Provider settings, and from the Object Versioning setting under Storage Provider settings.

It is possible for WP Offload Media to start off using direct URLs to the storage provider, and then switch to a CDN without needing to do anything with the offloaded objects.

WP Offload Media not only rewrites local media URLs to their offloaded version when content is displayed to a site visitor, it also rewrites the URLs when editing content. This is done to ensure that the editor can display the media even if the local files have been removed from the server.

Another huge benefit of WP Offload Media dynamically rewriting local media URLs to cloud storage URLs is that we can display images in the WordPress dashboard that have been set to private.

Rewriting Local URLs to Delivery Provider URLs: Hooks

There are quite a lot of filters that WP Offload Media implements that relate to rewriting local media URLs to cloud storage URLs in displayed content. Here we detail some of the most important filters for posts and pages, but WP Offload Media also makes sure the WordPress Customizer and various other areas are able to use offloaded media.

the_post

This is a very important filter that WP Offload Media implements for URL rewriting.

When this filter is fired, WP Offload Media analyzes the retrieved post_content for any local media URLs that relate to an offloaded Media Library item, and swaps in the cloud storage URL.

The job is made much easier if standard WordPress functionality was used for building the post or page’s content and embedding img tags, as they include a class attribute like “wp-image-666” that tells WP Offload Media the Media Library items post_id (666 in this case). With this information WP Offload Media can very quickly check whether it has offloaded that Media Library item, confirm the given local URL is as expected, and then provide a cloud storage URL for the given file.

If any URLs in the content have to be analyzed without the aid of an ID being specified, then WP Offload Media has to work a little harder and deconstruct the URL to find a Media Library item that matches for the path.

To save on processing a local URL over and over again as it is displayed to site visitors, WP Offload Media creates a cache of local URLs to ID mappings for each post and page it filters. This means WP Offload Media can check the cache which is very fast to load, and quickly find the cloud URL if already mapped.

the_content

This filter fires a little later than the_post, but is processed in the same way by WP Offload Media.

Some plugins use this filter themselves for their custom content, which makes it much easier for WP Offload Media to rewrite URLs for them.

as3cf_filter_post_local_to_provider

This filter is implemented by WP Offload Media so that a third-party theme or plugin can apply it if they do not want to use the the_content filter or other standard WordPress filters that may have undesirable side effects. It is processed in the same way as the_post, the_content and other filters to rewrite local media URLs to their delivery provider version.

This filter is particularly useful in custom or child themes that would like to ensure any generated content is properly filtered.

Rewriting Local URLs to Delivery Provider URLs: Data

The same data is used as for automatic offloads, with the following also playing a critical part in rewrites.

as3cf_items (table)

source_id is a key field of the as3cf_items table when performing lookups for offloaded Media Library items. This is a foreign key to the posts.post_id field, and the related postmeta records that WP Offload Media inspects to validate that a file really is an offloaded Media Library item.

The source_path and original_source_path fields also come into play when attempting to find an offloaded Media Library item that has a matching local URL.

When dealing with private objects that require signed URLs, the extra_info field contains a list of private_sizes that need signing, and the private_prefix is used to alter the path when signed CloudFront URLs have been configured.

amazonS3_cache

This postmeta table record with meta_key of amazonS3_cache is the cache record associated with posts and pages that have been filtered at least once. When filters such as the_content are processed and new URLs are found that have not been rewritten from local to cloud storage URLs before, they are added to this record once the rewrite has completed. From then on this record provides a very quick way of finding the delivery provider URL to use instead of the local media URL.

When a Media Library item is permanently deleted then WP Offload Media purges any amazonS3_cache records that contain its local URL.

Rewriting Delivery Provider URLs to Local URLs

When a post or page is being updated in WordPress, WP Offload Media makes sure to rewrite any cloud storage URLs that have been pasted into the content back to the local URL that WordPress and third-party themes and plugins expect to see in the database content. This rewriting only happens if the WP Offload Media recognizes that the URL points to an offloaded item managed in the as3cf_items table. It does not rewrite other remote URLs.

To all intents and purposes this is just a reverse process to the rewriting of local URLs to Delivery Provider URLs, but triggered from a different set of hooks.

Rewriting Delivery Provider URLs to Local URLs: Hooks

content_save_pre

This and similar filters for other parts of a post are used by WordPress before content is saved to the database. WP Offload Media uses these hooks to rewrite any delivery provider URLs it finds in the content back to their local URL.

as3cf_filter_post_provider_to_local

This filter is implemented by WP Offload Media so that a third-party theme or plugin can apply it if they do not want to use the content_save_pre filter or other standard WordPress filters that may have undesirable side effects. It is processed in the same way as content_save_pre and other filters to rewrite delivery provider URLs back to their local media URLs.

Rewriting Delivery Provider URLs to Local URLs: Data

The same data is used as for Rewriting Local URLs to Delivery Provider URLs: Data. Even amazonS3_cache records in postmeta are updated to ensure any media URLs added to content that do not conform to the WordPress standard have a mapping to their ID made available for quick lookup.

More Information

This document could not hope to ever be a complete record of how WP Offload Media works, it is simply too complex. However, if you feel something important is missing from this doc, or are confused by anything, please do contact us and we’ll try and help and likely update this doc too.