| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by grayhatter 1167 days ago

I completely agree with this; but my understanding about how LLM work is that they don't copy meaningful segments of text from any specific source. Instead, they predict the next block of text, which they'd only do if they've seen that idea/sequence enough times with context to rank the prediction high enough.

I haven't seen any service copy out large block of text enough to make me think it's reasonable to call their output plagiarized.

Meaning, if the LLM I use will only repeat an idea that many someone's have written about, such that it's seen the idea, or parts of that idea many times. Why is that still plagiarism? Or rather, worthy of direct attribution? Or why was I wrong to use the argument about citing a dictionary here?

(I'm aware that a number of people are working on giving memory so AI can quote from pages like wikipedia. But I don't think it's fair to call that "training data")

1 comments

ipaddr 1167 days ago

"use will only repeat an idea that many someone's have written about, such that it's seen the idea, or parts of that idea many times"

As you go narrower with a query only one source of truth is available at that point it does plagiarize.

link

grayhatter 1167 days ago

Very interesting, any chance you've got a citation for this? I'd like have some sort of proof next time I tell someone that they will happily plagiarize from single sources.

link

schwartzworld 1167 days ago

Even if it's not copying word for word, if it's not citing its data, it's still plagiarism. Plagiarism includes copying ideas without crediting their source.

link

grayhatter 1166 days ago

so no way to prove this claim isn't something someone made up one day?

link

schwartzworld 1166 days ago

Here's a counter question:

Have you ever seen chatgpt cite or credit it's sources?

link

grayhatter 1166 days ago

I don't (yet) believe that what chatgpt does requires citation to be ethical/honest.

link