Hacker News new | ask | show | jobs
by wildpeaks 1207 days ago
Let's not forget that ChatGPT can lie: just because it says something doesn't make it true.

It's more realistic to assume that any data a company is able to access will get gobbled up sooner or later because there is no real penalty for ignoring robots.txt or licenses at their scale: even if someone were to notice an infraction and has enough money to sue them for years, they can afford it and brush it off as the cost of doing business (and if it's not ChatGPT, then another model, the cat's out of the bag now).

A robots.txt gives as much protection as a "please do not hack me" text file against a ransonware.

2 comments

In a way it’s even worse. Listing the stuff you don’t want crawled might be more like a text file with a list of vulnerabilities that hackers shouldn’t use against you.
Considering that but about robots.txt is true (and I feel it is true) what can one do. Are there no regulations (implemented or in planning stage) on any of the bodies which decide the standards?

At some point, content owner should be - technically - be having some control to be able to limit / control who accesses their content