Hacker News new | ask | show | jobs
by dragonwriter 816 days ago
> On the flip side, it wouldn't be hard to put guardrails on chatgpt output so that if too large a percentage of an answer is verbatim, it's blocked.

It wouldn't be hard conceptually, but it would be a copyright violation unless OpenAI could establish a novel kind of fair use distinct from the AI training fair use they rely on for ChatGPT not to ve a copyright violation no matter what output it produces, since what it would involve is building a database that is a mechanical cooy of all the copyright-protected works in ChatGPTs training set, and integrating it as part of the commercial ChatGPT product, and consulting it using some kind fof full-text search each generation from ChatGPT to verify that no passage of sufficient length was reproduced verbatim.

1 comments

Not necessarily. Youtube has fingerprints of copyright works for this exact purpose, and it works fine.
Youtube Content ID is based on a specific agreement with the individual content owner that permits the specific use. Which works for Youtube because its for UGC, not content Youtube generates.