Hacker News new | ask | show | jobs
by dutchbrit 1262 days ago
I wonder if blocking user agents all in robots.txt and then allowing only googlebot etc. would be sufficient? At the end of the day, content rewritten (by AI or people) can still be breaking copyright law making it already illegal what OpenAI and others are doing. It's their responsibility at the end the day and not the content creator. OpenAI and others getting information online or offline should ultimately get written permission from authors or only crawl content released in the public domain. Same applies to the person who asks the AI to create content and publishes it (ie. “Please rewrite the following copyright protected content”).
1 comments

only if the scraper/crawler actually pays attention to robots.txt
Which they ultimately should do when crawling a site. OpenAI does according to ChatGPT.
as a coworker of mine liked to say, "'should' is a funny word"