Hacker News new | ask | show | jobs
by Kenji 3582 days ago
Uh, stupid question but how do you cache a website like for example this comment thread on hackernews? Suppose a DDoSer calls this comment thread a lot of times. The request has to go through to the server because when I hit F5 or post a comment myself, I see the comments in realtime. How do you handle that exactly? Does caching for a few seconds help already, or does the backbone push updated sites to the CDN server? I have no experience in DDos mitigation.
6 comments

Cache everything a guest accesses for 5 minutes or more. Vary on the specific cookie that represents a signed-in user.

None of my guests have noticed this, and it has increased most of my analytics numbers as my pages are faster too.

The signed-in users, they get the dynamic pages.

But now the cookie that identifies the user is what you use to correlate any attack traffic, the attacker is forced to (somewhat) identify themselves and you can then revoke their authentication status or ban the account.

Finally you captcha and/or rate-limit the login page.

This is effectively what I do on my sites, the pages themselves and the underlying API all cache if the cookie or access token is absent.

This is trivial to do within the code, but can be harder to do with the CDN/security layer (who need to support a "vary on cookie" or "bypass cache on cookie" or equivalent).

The important thing you need to assess is how critical is it that clients receive fresh data.

You can imagine that for a real time service it would be better to provide a timeout immediately rather than providing stale data.

HN is an example of a near on-line site where some delay is perfectly acceptable. No one cares that they're receiving a 2 second old page, it's better for the site users to reveive old data fast rather than new data slow.

If you use nginx the following commands would help out significantly (if I remember them correctly)

proxy_cache_use_stale updating proxy_cache_lock on proxy_cache_lock_timeout 1s

This config allows nginx to fetch cache updates while serving clients and when fresh data is received from the upstream application server it'll use that immediately.

If that's wrong hopefully someone can correct the conf.

What you can do with a site like HN will be different than if you're a shopping getting DDoSed on Black Friday by a competitor.

You can put the whole of HN into read only mode if needed and it'll have no real impact; disallowing purchases on MyAmazonCompetitor.com would be catastrophic.

Literally only cache for 1 or 2 seconds at a time.

Lots of people use page caching to speed up their website, but that's a mistake, since caching means stale data on dynamic sites. Caching should only be used to solve resource issues, not latency issues.

Your entire site should be fast already without caching. This comments page should only take a few milliseconds to generate. If it doesn't, then something's wrong with the database queries.

I will never understand how some sites take hundreds of milliseconds to generate a page.

Make your comments system a static site generator, so that each comment generates a static HTML page and you serve that statically. 4chan does this.
If you're getting more traffic than 1 request/s, it's less work to generate a static cached version on the cadence of ~1 second than to dynamically generate the content for each request.