Hacker News new | ask | show | jobs
The Times Sues OpenAI and Microsoft Over A.I.’s Use of Copyrighted Work (nytimes.com)
45 points by thecybernerd 904 days ago
7 comments

Inevitable outcome. Since ChatGPT launched, nobody has a clue as to what is legal and what is illegal with these chat-based LLMs.

Is the content that LLMs produce enough to rise to the level of copyright infringement? Is the fact that a company trained their LLM on your data, with the knowledge it would be used for outputs (=profit), enough that all of their outputs should be considered, to at least a minuscule degree, influenced by your work? How would ChatGPT's "training" differ from, say, another journalist who reads the NYT, and subconsciously uses that to help provide better services?

None of us can answer these questions definitively. The courts hearing these sorts of arguments were a foregone conclusion. I think a lot of the large LLMs (certainly OpenAI competitors) are going to breathe a sigh of relief that this is happening sooner rather than later, so they know where the legal lines are to be drawn.

This will be an interesting inflection point for humanity.

Though, call me jaded, but I can’t help but doubt that the _actual_ content creators, the writers themselves, will see any of the money should The Times win or settle the case.

The content creators for the Times have already been paid for their work.
They were paid when the original content was to be printed or posted on the internet.

Subsequently selling (or extracting compensation for) those works to AI companies is an emergent revenue stream.

I suppose the NYT isn’t legally obligated to share that revenue fairly with the authors, but it’d be awful nice if they did.

Believe me, publishers have enough trouble keeping writers employed. If they could give them a larger cut or do some kind of revenue share, most editors and GMs would love to (and many do).
The trajectory we're seeing with quality small AI Models, coupled with the self-imposed censorship and the foreseeable scarcity of high-quality training data due to new copyright law, leads me to forecast a surge in "pirate" models.

Increasingly, the distinction between core model training and fine-tuning might become ambiguous (how ?). Considering this, we might witness a trend where custom 'add-ons' for AI models become commoditized. Imagine simply downloading a "New York Times" pack to enhance your unofficial "pirate" language model.

"The legal landscape surrounding generative-AI is unsettled, with the technology still in its early days. There are other lawsuits that could test the rights of AI companies to “scrape” content from the web to train AI tools, including one by several prominent book authors against OpenAI. In February, Getty Images sued the AI art company Stability AI in Delaware, alleging that it had infringed on Getty’s copyrights."

Any news or speculations on these cases?

heh. good luck with that one. everyone is crawling the web now. why didn't they sue google for using their content in the serps?
There are significant differences: attribution and snippetting. OpenAI probably cannot claim these.
And Google search doesn't "generate" new content that potentially puts out of business the very same entities it learned from.
currently, and the non-paywalled link: https://news.ycombinator.com/item?id=38781941