| Yes, that's a defense. Twice over. I bought an MP3 album from Amazon last weekend. One of the many things I got from that purchase was the ability to copy that album, which would be a copyright violation. That doesn't make the purchase unjustifiable, immoral or illegal — my actual use for the album justifies the purchase. The possible copyright violation is irrelevant. People will try to trick you with statements that mention something bad and omit everything good. Don't let them. Think about what's omitted. Does chatgpt get anything good, legal, useful from reading NYT? I'd say it does. For example, it gets the knowledge necessary to explain things in three paragraphs, partly based on NYT articles. And partly based on Wikipedia, which in turn is based on the NYT. OpenAI is saying that training to providing a three-paragraph summary of recent events is fair use of newspapers, and that such training is not realistically possible without copyrighted materials. It's saying that if you make copyright violations impossible instead of difficult, then you can't use the articles fairly either. Sounds persuasive to me. There's a second aspect, less important IMO: de minimis non curat lex. "The law does not concern itself with trifles" basically. If OpenAI made it really difficult to make GTP do a certain thing, if you have to try many times and it's not even clear whether each attempt succeeded, then the possibility of doing that thing isn't a matter of law, says that principle. |
The NYT does not give readers the license to recite substantial portions of their articles verbatim on their own websites, even though buying paid access to the NYT website technically gives them the ability to do so. In the same manner, OpenAI did technically gain the ability to do the same, but has not acquired the right to do so. The funny fact that you have to enter some magic words into a form on their website in order for their site to regurgitate entire article texts does not change anything; if I would do the same on my website, for example via a form that requests to enter the first 100 words verbatim before spitting out the entire text, it would obviously still be illegal. The same goes for the fact that you have to perform some attempts before one of them succeeds in reproducing the entire text correctly; I could replicate that just as well by adding an RNG that only returns the valid text in 20% of cases, and my website would still clearly be illegal.