| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rester324 153 days ago
	You can implement this yourself, who is stopping you?

1 comments

zzzeek 153 days ago

Citation needed

link

zimpenfish 153 days ago

I use iocaine[0] to generate a tarpit. Yesterday it served ~278k "pages" consisting of ~500MB of gibberish (and that's despite banning most AI scrapers in robots.txt.)

[0] https://iocaine.madhouse-project.org

link

chao- 153 days ago

Can't seem to access this.

It flashes some text briefly then gives me an 418 TEAPOT response. I wonder if it's because I'm on Linux?

EDIT: Begrudgingly checked Chrome, and it loads. I guess it doesn't like Firefox?

link

zephen 153 days ago

Doesn't work on my firefox either.

Friendly fire, I suppose.

link

godelski 153 days ago

Works on my Firefox. Mac and Linux

link

dpkirchner 153 days ago

Nor Safari on iOS.

link

zimpenfish 152 days ago

Works fine on my iOS Safari - maybe there's some extension that's tickling it just the wrong way?

link

dpkirchner 152 days ago

It still fails with all of my extensions disabled (wipr, privacy redirect). I just get a download dialog. I don't know what the HTTP status code is, however.

I found a flagged HN submission about it and it has just about the same result for me and for others. My first tap failed in a weird way (showed some text then redirected quickly to its git repo) and all subsequent taps trigger a download.

https://news.ycombinator.com/item?id=44538010

link

doublerabbit 153 days ago

Unfortunately and you kind of have to count this as the cost of the Internet. You've wasted 500Mb of bandwidth.

I've had colocation for eight years+. My monthly b/w cost is now around 20-30Gb a month given to scrapers where I was only be using 1-2Gb a month, years prior.

I pay for premium bandwidth (it's a thing) and only get 2TB of usable data. Do I go offline or let it continue?

link

zimpenfish 152 days ago

> You've wasted 500Mb of bandwidth.

Yep, it sucks, but on the positive side, I'm feeding 500Mb of garbage into them every day and that feels like enough of a small win for me.

> My monthly b/w cost is now around 20-30Gb a month given to scrapers [...] 1-2Gb a month

That definitely sucks.

> Do I go offline or let it continue?

Might be time to start blocking entire IP ranges and ASNs and see if that helps.

link

zzzeek 152 days ago

i have no idea what this does because the site is rejecting my ordinary firefox browser with "Error code: 418 I'm a teapot". Even from a private browser.

If I hit it with Chrome, now I can see a site.

Seems pretty not ready for prime time as a lot of my viewers use Firefox

link

godelski 153 days ago

One of the most popular ones is Anubis. It uses a proof of work and can even do poisoning: https://anubis.techaro.lol/

They even mention iocaine. I know, inconceivable!: https://iocaine.madhouse-project.org/

There's also tons of HN posts on the topic with varying solutions:

https://news.ycombinator.com/item?id=45935729

https://news.ycombinator.com/item?id=45711094

https://news.ycombinator.com/item?id=44142761

https://news.ycombinator.com/item?id=44378127

link

zzzeek 152 days ago

Anubis is the only tool that claims to have heuristics to identify a bot, but my understanding is that it does this by presenting obnoxious challenges to all users. Not really feasible. Old school approaches like ip blocking or even ASN blocking are obsolete - these crawlers purposely spam from thousands of IPs, and if you block them on a common ASN, they come back a few days later from thousands of unique ASNs. So this is not really a "roll your own" situation, especially if you are running off the shelf software that doesn't have some straightforward means of building in these various approaches of endless page mazes (which I would still have to serve anyway).

link

GuinansEyebrows 153 days ago

https://forge.hackers.town/hackers.town/nepenthes

> Citation needed

this reply kinda sucks :)

link