| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by marginalia_nu 475 days ago
	You basically need proof-of-work to make this work. Idling a connection is not computationally expensive, so is not a deterrent. It's a shitty solution to an even shittier reality.

1 comments

xena 475 days ago

Main author of Anubis here:

Basically what they said. This is a hack, and it's specifically designed to exploit the infrastructure behind industrial-scale scraping. They usually have a different IP address do the scraping for each page load _but share the cookies between them_. This means that if they use headless chrome, they have to do the proof of work check every time, which scales poorly with the rates I know the headless chrome vendors charge for compute time per page.

link

ArinaS 475 days ago

Is there any particular date/time you'll introduce a no-JS solution?

And are you going to support older browsers? I tested Anubis with https://www.browserling.com with its (I think) standard configuration at https://git.xeserv.us/xe/anubis-test/src/branch/main/README.... and apparently it doesn't work with Firefox versions before 74 and Chromium versions before 80.

I wonder if it works with something like Pale Moon.

link

xena 475 days ago

It will be sooner if I can get paid enough to be able to quit my day job.

link

vhcr 475 days ago

I used to have an ISP that would load balance your connection between different providers, this meant that pretty much every single request would use a different IP. I know it's not that common, but that would mean real users would find pages using anubis unusable.

link

lifthrasiir 475 days ago

Do you think that, if this behavior of Anubis gets well-known and Anubis cookies are specifically handled to avoid pathological PoW checks, does Anubis need a significant rework? Because if it's indeed true this hack wouldn't last much longer and I have no further idea to avoid user-visible annoyances.

link

solid_fuel 475 days ago

Well, if they rework things so that requests all originate from the same IP address or a small set of addresses, then regular IP-based rate limits should work fine right?

The point is just to stop what is effectively a DDoS because of shitty web crawlers, not to stop the crawling entirely.

link

lifthrasiir 475 days ago

> Well, if [...], then regular IP-based rate limits should work fine right?

I'm not sure. IP-based rate limits have a well-known issue with shared public IPs for example. Technically they are also more resource-intensive than cryptographic approaches too (but I don't think that's not a big issue in IPv4).

link

dharmab 475 days ago

> then regular IP-based rate limits should work fine right?

These are also harmful to human users, who are often behind CGNAT and may be sharing a pool of IPs with many thousands of other ISP subscribers.

link

specialist 475 days ago

> Weigh the soul of incoming HTTP requests using proof-of-work to stop AI crawlers

Based on the comments here, it seems like many people are struggling with the concept.

Would calling Anubis a "client-side rate limiter" be accurate (enough)?

link

runxiyu 475 days ago

Probably not

link