| HN Mirror

Correct; the ability of a model to reproduce source material verbatim does not necessarily make the model's existence illegal. However, using a model to do just that might very well present a legal liability for the user. I would be interested to see the extent to which models can "recite from memory" source code, e.g., from the various MS code leaks. Put another way, if I'm using LLM code generation extensively, do I need to run a filter on its output to ensure that I don't "accidentally" copy large chunks of the Windows codebase?