Page Caching: Varnish vs Nginx FastCGI Cache

Varnish has long been a part of the stack we use here on our site, handling full-page caching, but after some benchmarking it looks like Nginx FastCGI Cache is actually a better choice.

If you followed along with Ashley’s Hosting WordPress Yourself series, you’re probably familiar with the stack but here’s a diagram as a refresher:

Ashley's Server Architecture

Nginx employs FastCGI Cache for full-page caching, PHP-FPM processes the PHP, Redis manages the object cache, and MySQL is at the very back. This site (deliciousbrains.com) has been running a similar stack since I set it up two and a half years ago:

This Site's Server Architecture

There’s a couple of minor differences:

  • Linode and CentOS rather than Digital Ocean and Ubuntu
  • HTTP requests are served rather than redirecting and always serving HTTPS
  • Apache mod_php rather than PHP-FPM
  • APC rather than Redis for object cache

But the biggest difference is definitely the presence of Varnish and using it over FastCGI Cache for full-page caching. Because Varnish doesn’t support HTTPS, we have Nginx sitting in front of it, handling the HTTPS bits and proxying requests for Varnish. Varnish then proxies requests to Apache on the backend.

Why Apache?

As I’ve written previously, I had doubts about managing my own server, especially one that my company and its employees depend on to bring in revenue. I went with Apache because I knew it well. Nginx + PHP-FPM was relatively new in comparison and I didn’t know it at all.

I was also seeing “Gateway 502” errors (an Nginx timeout error) now and then while browsing the web. This could have been due to lots of things, but I assumed it was mainly due to the set_time_limit() PHP function not having any effect when running PHP-FPM. Definitely a strike against.

I’ve since played with Nginx + PHP-FPM a bit and have more confidence using it in production. Especially where I have control over PHP-FPM’s timeout settings.

Why Varnish?

I had been reading good things about Varnish. It did full-page caching really well and could handle massive traffic without breaking a sweat. Web hosts were adding it to their setups. I believe WP Engine was using it.

So in 2012 decided it would be worth a try. I setup an Amazon EC2 instance with Varnish and ran my blog on it for a year. I got comfortable with Varnish. It worked well.

When I setup the server for deliciousbrains.com, I felt good about running it there as well. And like I said above, I didn’t know Nginx well at all, let alone the FastCGI Cache options. It’s also possible that FastCGI Cache wasn’t mature in 2012, I’m not sure.

Why Varnish Today?

If I setup a new server today, would I still go with Varnish?

When I first reviewed part 4 of Ashley’s series, I thought Varnish would destroy FastCGI Cache in performance because it stores cached pages to memory while FastCGI Cache stores it to disk.

Well, after asking Ashley about that, it turns out you can configure the FastCGI Cache folder to be stored in memory. Time for some benchmarking!

FastCGI Cache (Disk) Benchmark

I tried using a similar benchmark as Ashley used in his article. I ran a Digital Ocean 2GB server with Ubuntu running 1 to 1,000 concurrent requests over a 60 second time period. All requests over regular HTTP.

FastCGI Cache (Disk) Benchmark Results

A similar result to Ashley’s benchmark. The response time was double Ashley’s but this was likely due to difference in distance between my Digital Ocean data center (Toronto) and the origin of the benchmark requests (Virginia). Ashley was running his between Ireland and London. I also transferred 3x the data in my test, so that could have had an impact as well. In any case, that’s our baseline for the following benchmarks.

Configure FastCGI Cache to Use Memory

To get the FastCGI Cache folder to be served from memory, we use Linux’s ability to mount a folder into memory. In my favorite editor, I edited /etc/fstab and added the following line:

tmpfs /sites/bradt.ca/cache tmpfs defaults,size=100M 0 0

This allows up to 100MB of cache files to be stored in memory for quicker access. Obviously you can tweak the folder path and size for your own site.

Now I saved and quit the editor and ran the following command:

mount -a

This mounts all filesystems configured in /etc/fstab. Now let’s see if it worked:

df -ah

You should see your folder in the output.

FastCGI Cache (Memory) Benchmark

Running the same benchmark now, we get the following:

FastCGI Cache (Memory) Benchmark Results

Surprisingly it performed a tiny bit worse, but so slightly that it’s not significant at all (i.e. if we ran it again it would probably perform slightly better).

Ashley guesses that this is likely because “disk” is actually solid-state (SSD) and much closer to memory performance than ye olde spinning hard disks. Sounds like a good guess to me.

Varnish Benchmark

Now time to try the same benchmark with Varnish. For my Varnish config, I’m using this template.

Varnish Benchmark Results

Again we see a dip in performance, but this time it’s significant. The average response has gone up from 82ms to 100ms. Let’s take a look at the response times over time to see what happened:

Varnish Benchmark Response Times

Looks like things are fine up to around 500 concurrent users, then it starts to struggle a bit. Looking at New Relic, Varnish causes a pretty big CPU spike:

Varnish CPU Spike

HTTPS

As a side note, HTTPS has a huge impact on the server at this scale (1,000 concurrent users). Running the first FastCGI Cache (Disk) benchmark over HTTPS I got a staggeringly different result:

HTTPS Benchmark Results

And looking at the response times you can see there’s a pretty solid relationship between the response time and the number of concurrent users.

HTTPS Benchmark Response Times

Looking at New Relic we can see that Nginx is causing a big CPU spike:

Nginx CPU Spike

So it looks like Nginx requires some significant extra CPU to do the encryption/decryption. There’s also some extra network latency for each request to do the TLS handshake.

To put things in perspective here, the server did fine up to 250 concurrent users before it started to gradually get slower. That’s pretty damn good for $20/month (Digital Ocean 2GB).

Conclusion

I’m pretty surprised that Varnish has been outperformed here. This was supposed to be the main selling point for having Varnish in the stack.

Varnish is definitely more configurable, but how much configuration do you really need? FastCGI Cache is plenty flexible for most sites.

I guess if you were interested in fragment caching, you might want to use Varnish so that you could use its Edge Side Includes (ESI) feature. Rachel Andrew did a nice article for Smashing Magazine about ESI if you’re interested in an intro.

There is one nice feature that Varnish added in 4.0 that FastCGI Cache doesn’t have yet: the ability to serve stale content when the cache has expired and trigger a fetch of fresh content. With FastCGI Cache, when you request content and the cache has expired, you have to wait for it to fetch fresh content from the backend which slows down the request a lot. It’s the difference between a 40ms response time and 200ms. Five times slower is huge.

You can add updating to the fastcgi_cache_use_stale directive, but it doesn’t solve this problem. The first request that hits expired content still has to wait for the fresh content to be fetched from the backend and any concurrent requests that come in during that time will get the stale content. A nice feature, but again, doesn’t solve the problem.

With what I know right now, I wouldn’t bother adding Varnish to the next server stack I setup. It’s just not worth the extra daemon running on the server, the extra configuration to manage, and most importantly the extra point of failure. But who knows, I could learn something tomorrow that changes my mind.

Have you used Varnish and/or Nginx FastCGI Cache? Maybe you’ve used something else for page caching? Let us know in the comments.

About the Author

Brad Touesnard

As founder of Delicious Brains Inc., Brad wears many hats; from coding and design, to marketing and partnerships. Before starting Delicious Brains, Brad was a busy freelance web developer, specializing in front-end development.

  • martyspellerberg

    I worked on a site that used Varnish and found it broke WP’s ability to password protect individual posts. But I’ve not heard complaint of that in discussions of WP + Varnish, leading me to suspect we must have had it misconfigured in some way.

    • Yes, Varnish config would very likely need to be modified to allow for password protected posts.

  • Renato Alves

    I’ve been using Nginx FastCGI Cache for 6 months now, I used to use Apache, and I’m pretty happy with the performance results so far. It’s fast, even with TLS (although our concurrent numbers don’t go higher than 50 at a time), together with spdy/http/2.

    Since the beginning of my learning Nginx, I’ve heard Varnish was good, but reading yours and other articles about it, I don’t see the value in allocating time to learn it when Nginx gives almost the same result for a simple website setup.

  • Renato Alves

    Brad, I forgot to ask, in the article you mentioned you use APC for object cache, how do you compare it with Redis? Did you find APC better/faster than Redis?

  • webdeme

    Thanks Brad for your insights.
    I compared the FastCGI Cache (Disk) vs FastCGI Cache (Memory) Benchmark on DO 1GB droplet.
    FastCGI Cache (Memory) performed better.

    • Marechal Golfinho

      You must have messed a lot the config… I get 5ms of response times on my setup

  • David Majchrzak

    The reason why you don’t get much of a difference between memory and disk cache is because of Linux file cache. As long as you have memory to spare, files will end up in the file cache and be just as fast as your tmpfs. Thomas Krenn has written an article explaining it here: https://www.thomas-krenn.com/en/wiki/Linux_Page_Cache_Basics

    • Patrick

      So just to confirm, Linux/Ubuntu will automatically keep files in memory if they are used often enough? No need for tmpfs even if we have the RAM?

      • mre

        exactly.

    • Stephen

      I’m wondering if Linux defaults to caching in the CPU’s cache (L1, L2), which is faster than RAM if you access it a lot. I’ve seen many tests where this was the case.

  • This was a good read Brad! Loved the stack graphs! Did you try HTTPS with Varnish? Di you get a similar degradation in performance?

    I’m not too surprised that Varnish did worse here. Although, I’m not sure what your varnish configuration was (outside the vcl file). How much memory did you give it? I think the default is 512 megs.

    Varnish is it’s own daemon (and only runs on RAM) so you might have run into issues because of configuration. I’m not sure it’d beat nginx caching in a basic scenario like benchmarking one page either way. But those were things I was wondering when I saw the slowest page at 162ms. That shouldn’t happen (I think).

    I think it’s important to stress that the nginx caching by default is pretty dumb. It won’t do that well once you introduce plugins that set cookies and things like that. It’s discussed here:
    https://www.nginx.com/blog/nginx-caching-guide/

    That said, Trellis does a great job at mitigating these issues in their config. They filter out the admin, xmlrpc and things like that. They also check for logged in cookies and such so you don’t serve cached content to them. You can see it here:
    https://github.com/roots/trellis/blob/11516d12e42355d0304eb4fb8c15be7ca8fcf6d4/roles/wordpress-setup/templates/wordpress-site.conf.j2#L66-L85

    • As I said in the article, Varnish doesn’t support HTTPS. You have to run Nginx, Pound, or some other proxy in front of it to handle the HTTPS bit.

      With regards to cookies and Nginx, yes, it’s a good idea to ignore the Set-Cookie header as Ashley did in his config and then choose which cookies will force a cache skip.

      To be fair, Varnish’s default is no better. Out-of-the-box it will just cache everything, even logged in users. You need to tell it what not to cache. The template I linked to in the article has a ton of configuration.

      The bottom line is that no matter what caching server you use, you’re going to need to tweak the config to specify exactly what to cache and what not to cache.

  • VARNISH_STORAGE param makes difference sometimes. Depending on what defaults you used, VARNISH_STORAGE with no changes will store cache on disk (like FastCGI Cache), and performance will be the same, because both will use “same backend storage”.

    This could be a good reason why performance was around the same. But even if you have VARNISH_STORAGE using file and not malloc, performance could not make really great difference because of Linux Page Cache in benchmarks.

    What was configured on you VARNISH_STORAGE param?

    • From what I just gathered it’s the -s switch in DAEMON_OPTS that sets the options for the location and size of the cache. I believe VARNISH_STORAGE is just a bash variable set in some configurations and then used as the value of the -s switch. In any case, this is what I was running…

      DAEMON_OPTS=”-a :6081
      -T localhost:6082
      -f /etc/varnish/default.vcl
      -S /etc/varnish/secret
      -s malloc,256m”

      Also, I’m pretty sure I didn’t touch this, so maybe memory is the default storage location now for Varnish 4.1?

      • “-s malloc, 256m” means that you allocated 256megs of memory for the cache. Once it’s filled, Varnish starts evicting objects. With a 2GB server, I would have given it a gig.

        I’m still not sure that it would have made a difference in the end. But it’s possible that it’s the reason why performance went down at around 500 users. It hit the memory limit and started evicting things.

        • The size of the cache shouldn’t have impacted the test here because I was benchmarking a single page. That is, it was the same page requested from all concurrent users. Basically a Slashdot Effect simulation. The memory the varnishd process is able to use (2GB minus what other processes are using) and the max number of connections (1,000) are more relevant to this benchmark.

          • Yeah, you’re right. Just trying to think what the issue could have been. I don’t think you hit max connections either.

  • pescadito

    ok, thanks for your sharing,
    but what about cloudway stack?, it claim to use nginx + varnish + apache + memcached + mysql.
    do you have tested and benchmarket it? what’s your opinion??

  • pescadito

    or even EasyEngine stack, do you think are similar solutions?

  • Good one

  • This is almost identical to the stack we set up (nginx fastcgi cache, php-fpm, memcached) to able to handle ~50MM monthly page views for less than <$1000 a month in server costs. It's battle tested and it works so well once set up.

    Great write-up!

  • maryan

    Disk cache is not a real disk cache… first connection to disk but all other requests to ram.

    this is done by linux kernel. Disk cache is always the best choice.

  • Mike P.

    Varnish will compress objects out of the box and then serve them per the Accept-Encoding request header. Is NGINX doing any content compression? It looks like you used a service called blitz.io to perform the benchmarking, is that correct? Does it accept gzip-encoded content? If not, that could be the cause of the CPU spikes.

    Were you able to confirm that Varnish was serving cache hits (I’m guessing “hits” on the benchmark results means a page hit, not a cache hit)? I don’t know how the FastCGI Cache works, but if the backend isn’t setting a Cache-Control header, and the filetype doesn’t match the whitelist in that VCL, Varnish will not cache the page.

  • Waqas

    Reading your posts making me think that you’re a server scientist and I’m a stupid person who never saw anything other than shared hosting. Now I ask myself how I’m gonna digest everything. Hosting sites nowdays is no fun. I’ll have to hire a person to do all this. or Maybe you can release readymade snapshots with different configuration for different hostings like linode, digitalocean and vultr etc. Ofcourse we will pay for them.

  • Btw as I just found this nginx setting a few months ago which is not active by default: proxy_cache_lock
    This will kill varnish “killer feature” you’ve talking about in the conclusion.

    • Interesting, but from what I can tell proxy_cache_lock doesn’t quite solve the problem. Even with it enabled the first request will still have to wait for the fresh content to be fetched from the backend.

      With proxy_cache_lock enabled, if multiple clients request a file that is not current in the cache (a MISS), only the first of those requests is allowed through to the origin server. The remaining requests wait for that request to be satisfied and then pull the file from the cache. Without proxy_cache_lock enabled, all requests that result in cache misses go straight to the origin server.

      https://www.nginx.com/blog/nginx-caching-guide/

      • Right but the aim to provide fast answers to many users in an up-to-date/dynamic webpage achieved. If you do not use this all request will go trough proxy and kill your (web) server (behind). In the end the user does not know if he is the first or not and so it doesn’t matter (I think).
        Ok it is quite cool that varnish itself triggers this request in the background (as far as I understand) especially for page with huge/aggressive caching but less users. But in the end it is not enough for the websites I’m responsible for to make use of complex VCL instead of just configure nginx a bit.

  • nginx 1.10 with http2, preload, upstream, php7.0-fpm, fastcgi_cache and open_file_cache is the key for your new Porsche. 😉

  • Dante F. B. Colò

    it would be interesting to see Lighttpd , Litespeed, Monkey and GWAN in this comparison

  • You could use NGINX, FAST-CGI for handling the cache/static contents/SSL, and PHP-FPM, and APACHE for dynamic contents, and perhaps add redis for database cache, and mariadb as the database. I think that would be the best setup out there for both SSL and non-SSL sites.

  • Marechal Golfinho

    This is very strange. I have a 1GB VPS setup on DigitalOcean running NGINX + FastCGI Cache and i get 5ms response times…. 80ms is the normal uncached response times…

    • Bill Pearce

      That’s likely network latency, no?

      • Marechal Golfinho

        I dont think network latency and wait time (response time) are the same thing… the response time is the time the server takes to process the response… My latency to this server is about 110ms so there’s no link to this awesome result of 5ms

  • Stephane

    About SSL issue and with heavy traffic web site we should use SSL offloading appliance, not web server capabilities.

  • Sean Gibson

    If you want to discuss more about page cache you can take help with our developers that you can find here http://www.arpatech.com/web-development feel free to ask any thing regarding latest technology.

  • Ashraf Amayreh

    Amazing write-up.

  • Ashraf Amayreh

    Amazing write-up. Perhaps you could just setup some crontab to re-request the content and cause it to re-cache? I mean if that one request is really too expensive. I don’t know how to make them sync though. I mean you would need to call it at exactly the right time.