| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vintagedave 378 days ago

Or a positive belief in human nature.

I admit I'm one of those people. After decades where I should perhaps be a bit more cynical, from time to time I am still shocked or saddened when I see people do things that benefit themselves over others.

But I kinda like having this attitude and expectation. Makes me feel healthier.

1 comments

tuyiown 378 days ago

I deeply agree with you, and I'd like to add:

Trust by default, also by default, never ignoring suspicious signals.

Trust is not being naïve, I find the confusion of both very worrying.

link

Sammi 378 days ago

You don't have to go as far as to straight up "trust by default". You can instead "give a chance" by default, which is the middle path.

Actually Veritasium has a great video about this. It's proven as the most effective strategy in monte carlo simulation.

EDIT: This one: https://youtu.be/mScpHTIi-kM

link

chasd00 378 days ago

i like that Veritasium vid a lot, i've watched it a couple times. The thing is, there's no way to retaliate against a crawler ignoring robots.txt. IP bans don't work, user agent bans don't work, there's no human to shame on social media ether. If there's no way to retaliate or provide some kind of meaningful negative feedback then the whole thing breaks down. Back to the Veritasium video, if a crawler defects they reap the reward but there's no way for the content provider to defect so the crawler defects 100% of the time and gets 100% of the defection points. I can't remember when i first read the rfp for robots.txt but I do remember finding it strange that it was a "pretty please" request against a crawler that has a financial incentive to crawl as much as it can. Why even go through the effort to type it out?

EDIT: i thought about it for a min, i think in the olden days a crawler crawling every path through a website could yield an inferior search index. So robots.txt gave search engines a hint on what content was valuable to index. The content provider gained because their SEO was better (and cpu util. lower) and the search engine gained because their index was better. So there was an advantage to cooperation then but with crawlers feeding LLMs that isn't the case.

link

Sammi 378 days ago

No robots.txt can't fix this.

Have you tried Anubis? It was all over the internet a few months ago. I wonder if it actually works well. https://github.com/TecharoHQ/anubis

link

EPendragon 378 days ago

This is a really cool tool. I haven't seen it before. Thank you for sharing it!

On their README.md they state:

> This program is designed to help protect the small internet from the endless storm of requests that flood in from AI companies. Anubis is as lightweight as possible to ensure that everyone can afford to protect the communities closest to them.

I love the idea!

link

mathgeek 378 days ago

> Trust by default, also by default, never ignoring suspicious signals.

While I absolutely love the intent of this idea, it quickly falls apart when you're dealing with systems where you only get the signals after you've already lost everything of value.

link