Hacker News new | ask | show | jobs
by antirez 2 days ago
So to avoid those energy-hungry LLM companies from scraping your website, you force each browser to compute a lot of hashes in a necessarily energy-hungry loop, creating, at the same time, all the kind of accessibility problems?
3 comments

I don’t get how people believe there’s a PoW function that both:

1. Allows access in reasonable time/battery use to me on my phone

2. Poses any meaningful challenge to the most compute-resourced organizations on the planet

I wonder how many cumulative hours of human life have been wasted waiting on Anubis.

There are a lot of people writing really bad scrapers and running them on far from high compute power systems. This is the prevent DoS because of those. The big companies are often far more clever and know they are traversing the whole internet and can come back later.
> I wonder how many cumulative hours of human life have been wasted waiting on writing comments on creamsicle reddit.

I disagree with a lot of the decisions around the design of Anubis... but resisting the current drive of the industry to ruin as much of the good faith resource donations from others is an admirable objective.

The point isn't to increase the amount of work required to the point of exhaustion, it's to require that scripts be able to offer the exact same feature set that browsers offer. The point isn't to make it impossible, it's too make it more expensive than free.

Anubis isn't trying to prevent all scraping, it's trying to reduce the abuse just enough that real requests get their fair share. You don't need to outcompute the botnet just slow them down a little.

I hate seeing the Anubis interstitial too, I've complained about it publicly already too. But it doesn't come close to the frustration of waiting 10s for an SPA to load all of the routes it'll never use before the first redraw. Clearly our industry has also decided latency is a good thing.

> I disagree with a lot of the decisions around the design of Anubis... but resisting the current drive of the industry to ruin as much of the good faith resource donations from others is an admirable objective.

"The road to hell is paved with good intentions" is a phrase for a reason.

Besides, if the authors intentions were pure it wouldn't infest the sites using it with the authors sexual fetish by default.

> The point isn't to increase the amount of work required to the point of exhaustion, it's to require that scripts be able to offer the exact same feature set that browsers offer.

What browsers? Websites intentionally breaking anything non-mainstream is precisely the problem.

> But it doesn't come close to the frustration of waiting 10s for an SPA to load all of the routes it'll never use before the first redraw.

Many sites only need anti-bot malware because of inefficient design like that.

The vast majority of that compute is locked in AI accelerators that do the inference. Those hardwares are bad at doing anything other than that---in fact crawlers would need more residential proxies than more computes in that regard.
> I wonder how many cumulative hours of human life have been wasted waiting on Anubis.

"How dare that mugging victim fight back".

The choice is not between Anubis and no Anubis, the choice is between Anubis and my website going offline because I can't afford the $400/month that AI scrapers would cost me (yes, I checked, and yes, that's the real figure) if Anubis wasn't in front.

Smarter people have put unique captchas (in form of domain-specific questions) on their websites for long time.
That makes sense, and I believe you, I'm just surprised it really deters the scrapers.
If it's dumb and it works, is it really dumb?
No it's not dumb, but I don't get how it manages to be so light still. Like I visit an Anubis-guarded site and barely have to wait. Scrapers really see that little CPU usage or wall time and back off? Or maybe that's just cause I'm not visiting sites that are under attack.
It chooses the challenge weight based on signals. If your phone looks like a phone from a residential IP you get a simple challenge.

If you then spam requests you might get another, harder, hallenge appear.

If you have a data center IP and look like bot traffic you get a hard challenge out the gate.

AFAIU after looking at their docs several months ago.

> "How dare that mugging victim fight back".

If you would go with that analogy this would be a case of the mugging victim stabbing random bystanders.

> The choice is not between Anubis and no Anubis, the choice is between Anubis and my website going offline because I can't afford the $400/month that AI scrapers would cost me (yes, I checked, and yes, that's the real figure) if Anubis wasn't in front.

Those aren't the only options and anyone who claims there is no other choice is either dishonest or incompetent.

They have 2 options:

  - Put their ~1kb of text on a ~0kb website, make it cacheable, make hosting it free, make downloading and rendering it instantenous, make it accessible and let users read it comfortably

   - Set up a CAPTCHA and make the website inaccessible, spy on the users or give their history to trillion dollar ad companies, make them wait 10 secs to proceed.
Guess which one HN front-page bloggers choose? I often comment and/or flag them, but they never learn.
Anubis doesn't rely on spying on the user.
It relies on you enabling browser features that can be used for spying.
Not just LLM companies, but bots in general. They were a big problem even before LLMs.