Death by stupid micro services. Even at 1.5 mil pages, and the traffic they are talking about this could easily be hosted on a a fixed $80/month linode.
This isn't specific to microservices. I've seen two organizations with a lot of content have their website brought to its knees because multiple AI crawlers were hitting it.
One of them was pretending to be a very specific version of Microsoft Edge, coming from an Alibaba datacenter. Suuuuuuuuuuuuuuuuuure. Blocked its IP range and about ten minutes later a different subnet was hammering away again. I ended up just blocking based off the first two octets; the client didn't care, none of their visitors are from China.
I’ve dealt with AI crawlers. I’ve even seen 8 different AI crawlers at once. And yes some have been very aggressive, and I have even blocked some who are particularly bad (ignoring robots.txt rules). But their traffic is a tiny fraction of what my infrastructure sees on a regular basis.
A well optimized platform, with good caching, shouldn’t really struggle with a few crawlers.
For a public website? Well if you don't have thousands of pages, then the solution would be as simple as installing Varnish, which is good practice anyways. If you actually have enough unique paths for an unauthenticated botnet to saturate, well that's a bit more complicated.
Yeah the playbook for serverless is to target developers that don't know anything about infrastructure, lock them in with proprietary APIs, and then hit them with a huge bill once they have any real traffic.
If you're using nginx as a proxy like the above commenter suggested, then if you're serving static/cached pages (should be able to for most public pages?), it can do over 10k RPS even on my n100 minipc (the limit there is actually the 1 Gbit NIC).
One of them was pretending to be a very specific version of Microsoft Edge, coming from an Alibaba datacenter. Suuuuuuuuuuuuuuuuuure. Blocked its IP range and about ten minutes later a different subnet was hammering away again. I ended up just blocking based off the first two octets; the client didn't care, none of their visitors are from China.
All of this was sailing right through Cloudflare.