| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cestith 2508 days ago

Two layers of the same cache can be beneficial, even if they're both using Varnish. Let's walk through a couple of request scenarios. I'll assume I'm both running the application/inner cache/load balancers and testing the request flows myself for simplcity of pronouns.

I request image42 and it's in the outer cache. I get served from the outer cache.

I request image127 and it's a cache miss on this server. It asks its backend, which is another cache, and this time it's a cache hit since it hadn't time out there yet.

I request image128 and my browser requests the same image again from the same backend, it doesn't even have to hit my load balancer the second time.

I request image2049. It's a miss on the outer cache. It's a miss on the inner cache. It gets generated by processing in a primary application. I then request it again, and I hit a different frontend cache. It's a miss in this frontend, but this cache is hopefully refreshed from that inner layer of cache rather than going all the way back to the application. If the load balancer pins traffic based on the ultimate end-user's IP to a particular inner-circle Varnish box via MRU then the chances are quite high that's what happens.

I request image4095 and it has expired from the inner cache, but is still unexpired in the outer cache so it never gets beyond the CDN.

1 comments

maxk42 2508 days ago

I understand what's happening -- there's no need to explain. Running a dedicated varnish instance for the handful of requests that have a cache miss is pointless and I'd be willing to bet he didn't benchmark it.

In 99% of workflows, what's going to happen on a cache miss at the CDN is you'll hit varnish, which will also suffer a cache miss since it's a rarely-requested resource that's being requested. That 1% of cases it helps with are the few that have been requested recently enough to have not been evicted from the Varnish server but not so recently that they haven't been evicted from the CDN. It's a vanishingly-small amount of traffic. Most of what your varnish server will be doing is making requests to your main server while doubling your bandwidth and server costs. And latency.

Not practical at all.

That infrastructure would've been better spent on another server in the load-balancer rotation -- which is also unnecessary since I run a nearly-identical offering with many times the traffic and do it off of a single server + CDN so I speak from experience.

Not to mention the most ridiculous turtle in this stack: Spaces itself is a CDN. That means on every cache miss the traffic gets bounced from a CDN (Cache #1) to a load balancer (does that imply multiple Varnish instances?) which bounces it to Varnish (aka Cache #2) to a server which makes one request to Postgres, then another to Redis (Cache #3) and if it finally finds its file it redirects to Spaces (Cache #4). Your real traffic coming in is almost all going to get served by the CDN -- and when it's not on the CDN, it's going to be from a page that only gets hit once a week or once a month or less. That means if it's not in your outer cache it's not going to be in any of your inner caches, since it's long-tail traffic. And the long tail is quite a lot of traffic.

Again: I have an image site that gets much more traffic than picsum and I run it off of a single server + CDN. My biggest cost by far is bandwidth. He's not doing himself any favors with all this over-engineering. My service has a CDN which -- upon cache miss -- serves a flat file from my server. Done. 4TB of data transfer monthly and .75TB of flat files stored across multiple volumes. New files are processed / generated at upload and that's the end of the story. I'm just some random shmuck on the internet so you don't have to believe me but I've just had an epiphany in reading this story by some guy who happens to do exactly what I do and not as well but with many, many more steps and I'm realizing I'm an expert on shit I don't even think about being an expert on while other people who think they're experts -- aren't.

link

cestith 2507 days ago

You do make some solid points. You assume, however, that there's a bandwidth cost between the load balancer and the backend which won't be true for everyone. You also don't consider the cache behind the load balancer might be much larger and have a much longer TTL than the CDN's cache. Economics of this sort of setup are entirely different if you're putting every piece on a cloud instance rather than having a rack somewhere with your private data flowing for free over your own switch.

link

dmarby 2507 days ago

Author of the post here, figured I'd clarify some things since there seem to be some major misconceptions present.

First off, I don't claim to be an expert, I find that a pretty arrogant title for anyone to use. I'd like to think I know a thing or two about building highly scalable webservices however, and of course I'm always open to the opportunity to learn if I'm doing things incorrectly.

That said, Picsum is what I use to play around with new technologies and try new things since it's high-traffic enough that I can get some real data on how things perform. Is it very over-engineered? Absolutely, but that's part of the fun.

When it comes to Picsum, the reason for not pre-processing all the images is that there are simply too many variations with the sizes and variations you can request through the API. For every image, there are 5001 * 5001 * 22 variations that can be requested, and in total, we have just under a thousand source images.

As for running Varnish behind our CDN, this is done for a couple of reasons:

- We can make sure that an image is only processed once simultaneously, even tho the CDN might request it multiple times before it's cache has been filled.

- We can apply optimizations, such as sorting and filtering the query parameters for variations, to achieve a better cache rate. This is not possible to do with the CDN provider we use.

The resources it uses are negligible, the extra latency within the cluster is vanishingly small, and it saves us a lot of extra processing. Every service within the Kubernetes cluster runs at least two replicas, varnish included, for redundancy and to distribute the load. We're not using separate servers for each layer/component, that'd be wasteful.

As for bandwidth costs, there's no cost for the bandwidth between the CDN and the load balancer, as DigitalOcean does not charge for load balancer bandwidth. There's also no cost for anything behind the load balancer, as this is all internal traffic, either within DigitalOcean or within the Kubernetes cluster itself.

Talking about Spaces, I think you might be confused. Spaces is an object storage, which also happens to have optional CDN capability built-in. Picsum only uses the object storage part, for storing the source images that are used for processing. The reason we use Redis to cache said source images is to avoid having to fetch them from Spaces on every request, as this is rather slow comparatively. An important distinction here is that Spaces/Redis stores and caches the source images, not the processed ones, which are cached by Varnish and the CDN.

As an aside, since you seem to think that comparing numbers for services with vastly different needs and usecases is worthwhile, Picsum serves a bit over 8TB of traffic a month, and costs less then your setup to run.

link