Hacker News new | ask | show | jobs
by sarchertech 104 days ago
The point about Harry Potter was just that the verbatim text for popular text in the training set is in there.

It’s the same as when you ask a model to generate an Italian plumber with overalls and it produces something close enough to Mario to be a copyright violation.

If you ask it to solve a very specific problem for which there is a solution well represented in its train set, you can definitely get back enough verbatim snippets to cause problems.

It’s also not a theoretical problem, you can Google for studies showing real world production of verbatim code with non-adversarial prompts.