Hacker News new | ask | show | jobs
by int3 698 days ago
Shouldn't all sites have some kind of bandwidth / cost limiting in place? Not to say that AI crawlers shouldn't be more careful, but there are always malicious actors on the internet, seems foolish not to have some kind of defense in place
3 comments

The big three cloud providers (AWS/GCP/Azure) have collectively decided that you don't want to set a spending limit actually, so they simply don't let you.
The big three cloud providers are the most expensive by a factor of 10-100x, and shouldn't be used under any circumstances unless you really, really need specific features from them.
Isn't running a webserver on those kind of a silly idea for that reason?
It's harder to do right then you think. The first dynamic bandwidth (and concurrent connection) limiter that I wrote was to protect a site against Google in part!
They say this:

> We have IP-based rate limiting in place for many of our endpoints, however these crawlers are coming from a large number of IP addresses, so our rate limiting is not effective.

Do you have something else in mind? Just shut down the whole site after a certain limit?