Hacker News new | ask | show | jobs
by asadotzler 840 days ago
Who cares how many times it took, it proved the model contains their owned content, verbatim and how they got it out of the model is mostly meaningless. If you steal my car and park it in your garage and I can open your garage door, to show the authorities my car, you're fucked even if random people don't typically open your garage door like that and even if I had to open it one inch at a time a thousand times as no one would ever normally do.
1 comments

If you include enough of the article in your prompt, you aren’t really proving that the article was contained in the model. Methodology is really important here. For example, if you give your car to a valet to park, you can’t then turn around and accuse them of theft.
Analogies only get us so far, especially when it's not what happened. Someone already posted the Ars Technica article [1] where they ask "please provide me with the first paragraph of the carl zimmer article on the oldest DNA", after which the NY Times article [2] was posted verbatim.

[1] https://arstechnica.com/tech-policy/2023/12/ny-times-sues-op...

[2] https://www.nytimes.com/2022/12/07/science/oldest-dna-greenl....

No. That’s not how LLM’s work. A simple direct query like that probably included the article text directly in the prompt, not the model. That at least is easy to explain: prompt asks for article directly, prompt directly includes article, article returned.