Hacker News new | ask | show | jobs
by izabera 521 days ago
Just for context, there is a new post about OpenAI DDoS'ing half the internet every other day on hn

https://news.ycombinator.com/item?id=42660377

https://news.ycombinator.com/item?id=42549624

2 comments

Just for context, the author of the second link in your comment verifiably lied about blocking crawlers via robots.txt

CommonCrawl archives robots.txt

For convenience, you can view the extracted data here:

https://pastebin.com/VSHMTThJ

You are welcome to verify for yourself by searching for “wiki.diasporafoundation.org/robots.txt” in the CommonCrawl index here:

https://index.commoncrawl.org/

The index contains a file name that you can append to the CommonCrawl url to download the archive and view. More detailed information on downloading archives here:

https://commoncrawl.org/get-started

From September to December, the robots.txt at wiki.diasporafoundation.org contained this, and only this:

>User-agent: * >Disallow: /w/

If you ask OpenAI to stop, using robots.txt, they actually will.

What Aaron was trying to achieve was great, how he want about it is what ruined his life.

It is a well known fact that OpenAI stole content by scraping sites with illegally uploaded content on it.
Nobody really asked Aaron about anything they collected more evidence and wanted to put him to jail.

School should have unplugged his machine bring him for questioning and tell him not to do that.