Hacker News new | ask | show | jobs
by Max-Ganz-II 95 days ago
To stop this, I today put most of my Amazon Redshift research web-site behind a basic auth username/password wall.

It's all remains free, but you need to email me for a username and password.

If I put in time and effort to make content and OpenAI et al copy it and sell it through their LLM such that no one comes to me any more, then plainly it makes no sense for me to create that content; and then it would not exist for OpenAI to take, or for anyone else. We all lose.

It seems parasitic.

2 comments

An AI is more likely than me to take the time to send you an email for requesting access - I'm too lazy.
I think a better approach would be to have a login form and just say "the password is 1234" or whatever.

Virtually no scraper has logic to handle that sort of situation, but it's trivial for humans. Way easier than an LLM

Not true, even Windows Defender is capable of extracting "the password is 1234" from context like emails or webpages.
Please add Internet Archive's bot to your auto-allows, at least. Their bot is presumably well behaved, and for public benefit.
I'm about to ask IA to remove my content!

The reason is that I expect LLM bots to be crawling IA.