| > or cherry-picked their examples from many attempts even if that were true, is it really a defence? > It said AI tools have to incorporate copyrighted works to “represent the full diversity and breadth of human intelligence and experience.” that sounds like an appeal "our product isn't much use if we can't violate copyright" |
I bought an MP3 album from Amazon last weekend. One of the many things I got from that purchase was the ability to copy that album, which would be a copyright violation. That doesn't make the purchase unjustifiable, immoral or illegal — my actual use for the album justifies the purchase. The possible copyright violation is irrelevant.
People will try to trick you with statements that mention something bad and omit everything good. Don't let them. Think about what's omitted. Does chatgpt get anything good, legal, useful from reading NYT? I'd say it does. For example, it gets the knowledge necessary to explain things in three paragraphs, partly based on NYT articles. And partly based on Wikipedia, which in turn is based on the NYT.
OpenAI is saying that training to providing a three-paragraph summary of recent events is fair use of newspapers, and that such training is not realistically possible without copyrighted materials. It's saying that if you make copyright violations impossible instead of difficult, then you can't use the articles fairly either. Sounds persuasive to me.
There's a second aspect, less important IMO: de minimis non curat lex. "The law does not concern itself with trifles" basically. If OpenAI made it really difficult to make GTP do a certain thing, if you have to try many times and it's not even clear whether each attempt succeeded, then the possibility of doing that thing isn't a matter of law, says that principle.