| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by VBprogrammer 7 hours ago

In general humans don't have perfect recall. Even people with what we might call a photographic memory don't have the ability to memorise millions of lines of code and output them with little effort.

It hinges somewhat on the concept of how much you believe things are being learned and how much is just pattern matching and borrowing a solution from memory. Certainly in the early days of Copilot it was possible to get it to output chunks of open source code near verbatim.

I think, generally, people are probably closer to believing that there is some kind of reasoning being carried out by these models than in those early days but it would also be easy to strip all of the immediately identifiable comments etc from the training materials to make it harder to detect.

2 comments

antonvs 1 hour ago

> how much is just pattern matching and borrowing a solution from memory.

It's easy to show that this is not the case. This is a well-known phenomenon in ML, known as generalization - specifically, compositional generalization. See e.g. https://research.google/blog/measuring-compositional-general... for a description - although note that that post is from 2020, and models have become much better at this since then.

People can "believe" what they want, but there's plenty of work that definitively falsifies beliefs about "borrowing a solution from memory".

iwontberude 6 hours ago

If it outputs copyrighted material, which it does handily, then it doesn’t really matter.