Hacker News new | ask | show | jobs
by tomschlick 3567 days ago
As someone who has never used Varnish but has used Nginx's cache to some degree... whats the benefit of placing varnish in the middle vs going with Nginx?
8 comments

If you use it for caching, nginx is lacking something quite important at the moment: stale-while-revalidate (https://tools.ietf.org/html/rfc5861)

Aka serving a cached response while the cache is getting refreshed (even to the request who initiated the refresh). Currently when nginx has to refresh the cache it will "hang" the current response until it gets refreshed (the requests that follow, even while the cache is getting refreshed, will get the stale version tho).

It's the difference between your client being amazed by how fast your api is, to "wtf does page X takes 3s to render from time to time" comments.

There is an openresty project that provides this https://github.com/pintsized/ledge - also provides ESI and more.
I think "proxy_cache_use_stale updating;" covers this case
Nope: "updating" does what I described, if another request initiated an update and it's in progress, it will serve a stale value, but the request who triggered the update is still "hanging" until the cache is refreshed. So you get that annoying slowdown for at least 1 user (often the CEO of your client company).

The way to work around this issue with nginx (for simple endpoints) is to write scripts that will hit the endpoints to make sure the cache is always hot, but it's a sad, half working hack (the CEO can still hit that cold cache page himself, if lucky enough).

There's a $500 bounty for this [1], but it doesn't look like anyone has taken it on.

[1] - https://www.bountysource.com/issues/972735-proxy_cache_use_s...

The user who posted the bounty has not had any activity on bountysource outside of that bounty as far as I can tell from their profile [0], and it was two years since he and others talked about it in the comments (not including the comment six months ago from someone else since that was not directed at the poster of the bounty). To anyone thinking of fixing the issue, might want to check up on whether the bounty is still up before getting to work. If the bounty is your goal, I mean.

[0]: https://www.bountysource.com/people/23342-bdavis

If you're already using Nginx and its caching subsystem is working for your use case(s), then I wouldn't worry about it. A typical web stack is composed of load balancing, SSL termination, caching and compression, and static and dynamic content serving. Nginx is capable of all these things (including dynamic content serving using OpenResty).

Once you start breaking your stack apart to 1) add redundancy, 2) isolate hardware for different workloads, and 3) scale the components independently of each other, the situation gets a little more interesting. Sure, you could run different clusters of Nginx for each of SSL termination, static content, cache and compress, etc. and have them reverse proxy to each other (and some people probably do this), but when you get to this evolution of your architecture, it's worth evaluating the different options for each layer of the stack. There's an argument to be made that e.g. HAProxy is a better load balancer, Nginx is a better static content server, Pound is a better SSL terminator, Varnish is a better caching and compressing reverse proxy, etc. (N.B. that I am not saying these things, I am just saying that they can be said.) If you're not serving static content at all—let's say you're presenting an API written in Elixir/Phoenix—it may not make sense to go with Nginx in the first place!

Regarding Varnish, specifically, I've found that it offers unparalleled power and flexibility when it comes to caching and compression. Yes, that power and flexibility comes at the cost of more complexity, but it's there when you need it. It offers different storage backends, including a pure memory backend; the ability to serve gzipped content directly from cache (for supported clients); PURGE and BAN HTTP verbs; synthetic responses; query string sorting; the ability to serve stale content while updating; ESI; and probably a whole bunch of other stuff I'm forgetting and/or have never used. Nginx and Nginx Plus may support some or all of these things, built-in or as modules... I'm not sure.

Faster and far more configurable, the configuration language (VCL) compiles to 'C' and you can inline bits of C if you want.

https://www.varnish-cache.org/trac/wiki/ArchitectureInlineC

Faster in theory maybe. In production nginx has always faster, more stable, less resources. I've used both extensively. Varnish has more caching features and configuration for sure.
Out of interest, do you have any metrics to support this?

It would be interesting to see data that compares the two, and see how close they come in different scenarios, and how tuning/configuration might affect the performance.

I've used both in production. They both fast enough that their performance won't be your bottleneck. One's probably faster than the other by some percentage, but whatever you do, you're going to have some other problem that's bigger than that percentage.

At that point, talking about metrics is likely procrastination.

I don't think your experience alone is enough to say whether performance differences are significant or not.

While in your experience the performance differences were negligible, but that doesn't mean that will hold true in all usage scenarios. For example, maybe they perform similarly when caching many small files, but one struggles with serving longer running requests.

Oh, sure. And one will be faster than the other in any given scenario. Not "faster like c++ is faster than ruby", though, it's "faster like this c++ compiler is faster than that one".
For caches much larger than available memory nginx will win because it caches to static files and uses sendfile:

http://www.bbc.co.uk/blogs/internet/entries/17d22fb8-cea2-49...

If you need just a generic cache, then using varnish, nginx or httpd are all fine choices. The issue is when you start scaling at obscene amounts of traffic. At that point, having a single single attempting to do too many things just doesn't work. varnish was/is designed as an extremely fast and performant cache which, when correctly configured, simply beats the pants off of nginx or httpd. It's easily as stable as nginx and httpd and as compliant as httpd (nginx is a bit less so).

For caches, the bugaboo is latency, and event architectures are not good choices when small-as-possible latency is an issue. It is here that heavily threaded architectures really shine. Sure, you can likely get by with less resources w/ events, but (1) performance will suffer and (2) as thread implementations improve, threading will only get better and more "stingy" re: resources.

* ESI support is really nice (if you're an API designer, you can really go nuts on this and make a super simple API do some very complex things or expressive composition from lots of simple calls)

* More fine-grained control of your caching is possible

* Easier to express normalisation of requests (increase your cache hit rate and protect your underlying origin from malicious requests by discarding cache busters)

* Inline C means you can do things like move your authentication to the edge

* In theory if you have enough RAM you can go faster than nginx's on-disk... in practice the sweet spot for the gain is small and they're both on par

But then... it comes with disadvantages too. Like most Varnish services would never let you do the nice Inline C stuff because no-one in their right mind would run untrusted code in their environment where it could impact another customer. If you see a provider do this (at any price point), avoid them.

Since 3.0 you have VMODs, to counter for the well-founded lack of Inline C support around. These Varnish modules will extend VCL with C, C++ or even Rust libraries on a safer manner: https://varnish-cache.org/vmods/

If you make your own VMOD(can take from a few hours to days), make sure to send a PR and add it to the directory above (IOW: share it) :)

"In theory if you have enough RAM you can go faster than nginx's on-disk... in practice the sweet spot for the gain is small and they're both on par"

Not in theory, but in reality. Even if you don't, varnish's file on-disk implementation is just as good as nginx's if not better.

The idea behind a cache is to increase performance. If that is the priority then you would pick a design and implementation geared towards that problem set. Sure you can say "Well, nginx or httpd is 'good enough'" and that would be right. But when nginx or httpd aren't good enough, you need varnish. And even when you don't need varnish, it is often a better architectural design to break out specific functionality (best of breed).

Thanks for the detailed list! ESI looks pretty sweet so I'll have to take a look at that as we scale out our API in the coming months.
A one or two orders of magnitude greater speed, lots of configurations for caching policies, the ability to split a request in several subrequests (ESI) and some other nice things.
Not sure if the performance gains are still valid. There's reports that state the contrary: https://deliciousbrains.com/page-caching-varnish-vs-nginx-fa... Nginx has something similar to ESI called SSI (http://serverfault.com/questions/406103/main-differences-bet...)
There is quite a few places for improvement in that test (e.g. tweaking number of threads in Varnish) and by no means the tests are like for like.

I will take it with a grain of salt or do your own tests, if you really care.

Yes, important point: Always test with your own infrastructure and run your own benchmark. There's nothing replacing own experimentation and experience. What works for one company does not necessarily work for others as well.
Take a look at `varnishlog` while you have traffic flowing thru - you'll feel like you've seen the light :) Also, read the vsl-query docs: https://www.varnish-cache.org/docs/trunk/reference/vsl-query...

In terms of introspectability, Varnish (afaik) is without equal.

Totally agreed!

I believe that the power and utility of VSL are so underestimated and overlooked that I don't know where to start...

The ability to manually purge the cache by URL, with wildcards, is pretty handy.
1: Performance. varnish provides much less latency.

2: More control.

3: Better compliance