Basically what they said. This is a hack, and it's specifically designed to exploit the infrastructure behind industrial-scale scraping. They usually have a different IP address do the scraping for each page load _but share the cookies between them_. This means that if they use headless chrome, they have to do the proof of work check every time, which scales poorly with the rates I know the headless chrome vendors charge for compute time per page.
I used to have an ISP that would load balance your connection between different providers, this meant that pretty much every single request would use a different IP. I know it's not that common, but that would mean real users would find pages using anubis unusable.
Do you think that, if this behavior of Anubis gets well-known and Anubis cookies are specifically handled to avoid pathological PoW checks, does Anubis need a significant rework? Because if it's indeed true this hack wouldn't last much longer and I have no further idea to avoid user-visible annoyances.
Well, if they rework things so that requests all originate from the same IP address or a small set of addresses, then regular IP-based rate limits should work fine right?
The point is just to stop what is effectively a DDoS because of shitty web crawlers, not to stop the crawling entirely.
> Well, if [...], then regular IP-based rate limits should work fine right?
I'm not sure. IP-based rate limits have a well-known issue with shared public IPs for example. Technically they are also more resource-intensive than cryptographic approaches too (but I don't think that's not a big issue in IPv4).
Basically what they said. This is a hack, and it's specifically designed to exploit the infrastructure behind industrial-scale scraping. They usually have a different IP address do the scraping for each page load _but share the cookies between them_. This means that if they use headless chrome, they have to do the proof of work check every time, which scales poorly with the rates I know the headless chrome vendors charge for compute time per page.