| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by josephg 557 days ago

I think the real question is this: Is chatgpt "just copying" the content in its training set? What constitutes plagerism, exactly?

If ChatGPT is reproducing content verbatim from its training set, then I think the claim its violating copyright holds a lot of water. (And I think there was a NYT lawsuit claiming such - and I wish them well).

But if chatgpt learns from 100 recipes for bechamel sauce, and synthesizes them into its own, totally original description, then I don't see how what its doing is any different from what the authors of those recipe books & websites are doing. If anything, its probably synthesizing a lot more sources than any recipe author. If the only common factor between chatgpt's output and any specific source is the (public domain) recipe itself then that seems ethically in the clear to me.

I can't see a justification to criminalise what chatgpt is doing with recipes, without casting so wide a net as to open recipe authors up for persecution in the same way.

Scraping a website isn't illegal. When humans do it, we call it browsing the web.

1 comments

_heimdall 557 days ago

At a minimum it's a big legal gray area. Writing a book review isn't illegal and requires no financial engagement with the publisher, but I can't actually find if SparkNotes or CliffNotes have to pay royalties. Those would be a pretty good parallel in my mind, they are doing more than a quick summary or review and are effectively compressing the content.

It feels wrong to me but that says nothing of the laws we currently have or how a judge would rule on it. Personally if I were on a jury I'd be inclined to side with the NY Times in their case against OpenAI, with the huge caveat that I only know the basic of their case and am not bound to only what's officially evidence.

link

josephg 556 days ago

Yeah, I feel the same re: NY Times. But thats because (iirc) the model was reproducing large parts of their articles word-for-word.

But so long as chatgpt doesn't reproduce any of its sources word for word, I don't think its a problem. Especially since cookbooks have been doing the same thing for centuries.

At least, I think that's where I would draw the line. But I agree - we're in very new territory. Who knows what a judge will think.

link