Hacker News new | ask | show | jobs
by buro9 3576 days ago
I've faced DoS attacks for years as I run internet forums.

The simple advice for layer 7 (application) attacks:

1. Design your web app to be incredibly cacheable

2. Use your CDN to cache everything

3. When under attack seek to identify the site (if you host more than one) and page that is being attacked. Force cache it via your CDN of choice.

4. If you cannot cache the page then move it.

5. If you cannot cache or move it, then have your CDN/security layer of choice issue a captcha challenge or similar.

The simple advice for layer 3 (network) attacks:

1. Rely on the security layer of choice, if it's not working change vendor.

On the L3 stuff, when it comes to DNS I've had some bad experiences (Linode, oh they suffered) some pretty good experiences (DNS Made Easy) and some great experiences (CloudFlare).

On L7 stuff, there's a few things no-one tells you about... like if you have your application back onto AWS S3 and serve static files, that the attack can be on your purse as the bandwidth costs can really add up.

It's definitely worth thinking of how to push all costs outside of your little realm. A Varnish cache or Nginx reverse proxy with file system cache can make all the difference by saving your bandwidth costs and app servers.

I personally put CloudFlare in front of my service, but even then I use Varnish as a reverse proxy cache within my little setup to ensure that the application underneath it is really well cached. I only have about 90GB of static files in S3, and about 60GB of that is in my Varnish cache, which means when some of the more interesting attacks are based on resource exhaustion (and the resource is my pocket), they fail because they're probably just filling caches and not actually hurting.

The places you should be ready to add captchas as they really are uncacheable:

* Login pages

* Shopping Cart Checkout pages

* Search result pages

Ah, there's so much one can do, but generally... designing to be highly cacheable and then using a provider who routinely handles big attacks is the way to go.

3 comments

Uh, stupid question but how do you cache a website like for example this comment thread on hackernews? Suppose a DDoSer calls this comment thread a lot of times. The request has to go through to the server because when I hit F5 or post a comment myself, I see the comments in realtime. How do you handle that exactly? Does caching for a few seconds help already, or does the backbone push updated sites to the CDN server? I have no experience in DDos mitigation.
Cache everything a guest accesses for 5 minutes or more. Vary on the specific cookie that represents a signed-in user.

None of my guests have noticed this, and it has increased most of my analytics numbers as my pages are faster too.

The signed-in users, they get the dynamic pages.

But now the cookie that identifies the user is what you use to correlate any attack traffic, the attacker is forced to (somewhat) identify themselves and you can then revoke their authentication status or ban the account.

Finally you captcha and/or rate-limit the login page.

This is effectively what I do on my sites, the pages themselves and the underlying API all cache if the cookie or access token is absent.

This is trivial to do within the code, but can be harder to do with the CDN/security layer (who need to support a "vary on cookie" or "bypass cache on cookie" or equivalent).

The important thing you need to assess is how critical is it that clients receive fresh data.

You can imagine that for a real time service it would be better to provide a timeout immediately rather than providing stale data.

HN is an example of a near on-line site where some delay is perfectly acceptable. No one cares that they're receiving a 2 second old page, it's better for the site users to reveive old data fast rather than new data slow.

If you use nginx the following commands would help out significantly (if I remember them correctly)

proxy_cache_use_stale updating proxy_cache_lock on proxy_cache_lock_timeout 1s

This config allows nginx to fetch cache updates while serving clients and when fresh data is received from the upstream application server it'll use that immediately.

If that's wrong hopefully someone can correct the conf.

What you can do with a site like HN will be different than if you're a shopping getting DDoSed on Black Friday by a competitor.

You can put the whole of HN into read only mode if needed and it'll have no real impact; disallowing purchases on MyAmazonCompetitor.com would be catastrophic.

Literally only cache for 1 or 2 seconds at a time.

Lots of people use page caching to speed up their website, but that's a mistake, since caching means stale data on dynamic sites. Caching should only be used to solve resource issues, not latency issues.

Your entire site should be fast already without caching. This comments page should only take a few milliseconds to generate. If it doesn't, then something's wrong with the database queries.

I will never understand how some sites take hundreds of milliseconds to generate a page.

Make your comments system a static site generator, so that each comment generates a static HTML page and you serve that statically. 4chan does this.
If you're getting more traffic than 1 request/s, it's less work to generate a static cached version on the cadence of ~1 second than to dynamically generate the content for each request.
Have you heard about cache busting? Someone just needs to request a page that's not cached and the request will always hit your web servers.
You'd be surprised how few attacks I've personally seen vary that much. But yes, it happens, and good applications put their identifiers in their paths and ignore the querystrings, most CDN/security providers allow the configuring of their layer to ignore querystrings entirely.

Of course, this is precisely the attack that works on a search page, hence the advice above to be ready to captcha that if you haven't.

Anything GETable cache, everything else you need to think about how to validate the good traffic (trivially computable CSRF tokens help) and captcha the rest.

404s, 401s, etc... they should cost the underlying server as little resource as possible and also cache their result at an applicable cache layer (404s at the edge and 401s internally, 403s at the edge if possible, etc).

If there are 90GB of static files, and 60GB are in the Varnish cache, cache busting will be pretty ineffective.
If there's any dynamic content and the request hits that, Varnish cache will be pretty ineffective.
Actually Varnish is great here, one normalises the requests and retains only the querystrings that are valid for your application filtering out (removing) all those that are not valid.

The key thing is, you know your application, and you know what the valid keys are and the valid value ranges. If you can express that in your HTTP server and discard requests then it can be very cheaply done.

A forum really doesn't have that many distinct URLs, and so this is easily done. It would be harder on a much more complex application, but the original question related to these smaller side-project applications.

Caching not necessarily means more speed. Sometimes it can make things slower.

1. Get from cache

2. Determine if cached value is valid

3. Query data store

4. Put data store value in cache

5. Return data

Instead of just getting it directly. In order to be able to cache you need to think about good cache invalidation. And client side caching won't work against malicious users.

Those are problems the CDN solves for you.
CDN is only for static content and I was assumed all your static content would be already on a CDN... it's a standard practice.
CDN is NOT only for static content. Minimally Cacheable, or catchable based on cookie value, content can be cached on a CDN. Also running ALL traffic through a CDN (like CloudFlare or Akamai) allows you to do traffic optimization, FEO, DDOS protection, and much more.