Hacker News new | ask | show | jobs
by VladVladikoff 429 days ago
Death by stupid micro services. Even at 1.5 mil pages, and the traffic they are talking about this could easily be hosted on a a fixed $80/month linode.
1 comments

This isn't specific to microservices. I've seen two organizations with a lot of content have their website brought to its knees because multiple AI crawlers were hitting it.

One of them was pretending to be a very specific version of Microsoft Edge, coming from an Alibaba datacenter. Suuuuuuuuuuuuuuuuuure. Blocked its IP range and about ten minutes later a different subnet was hammering away again. I ended up just blocking based off the first two octets; the client didn't care, none of their visitors are from China.

All of this was sailing right through Cloudflare.

I’ve dealt with AI crawlers. I’ve even seen 8 different AI crawlers at once. And yes some have been very aggressive, and I have even blocked some who are particularly bad (ignoring robots.txt rules). But their traffic is a tiny fraction of what my infrastructure sees on a regular basis. A well optimized platform, with good caching, shouldn’t really struggle with a few crawlers.
Honest question, why is rate limiting insufficient?

Can be done in two lines in nginx which is not just a common web server but also used as an API gateway or proxy.

You can rate limit by IP pretty aggressively without affecting human traffic.

One /24 of IP’s hammering on your website at a rate limited 2 rps is still a combined 500/s. I’m not sure many sites can sustain that.
For a public website? Well if you don't have thousands of pages, then the solution would be as simple as installing Varnish, which is good practice anyways. If you actually have enough unique paths for an unauthenticated botnet to saturate, well that's a bit more complicated.
Many sites hosted on Vercel, I suppose. If sites are hosted on nginx/varnish I’d be surprised if they didn’t do an order of magnitude more.
Yeah the playbook for serverless is to target developers that don't know anything about infrastructure, lock them in with proprietary APIs, and then hit them with a huge bill once they have any real traffic.
If you're using nginx as a proxy like the above commenter suggested, then if you're serving static/cached pages (should be able to for most public pages?), it can do over 10k RPS even on my n100 minipc (the limit there is actually the 1 Gbit NIC).