|
|
|
|
|
by mrh0057
1811 days ago
|
|
Why is everyone ignoring the fact what neural networks do? It is being used as a search context aware pattern matching and use that to predict what you will write next. Of course it's going to return copyrighted works based on what you right. It's a pattern matching algorithm what exactly did they think it was going to do? |
|
What they maybe aren't considering is that specific snippet is famous. It has likely been pasted thousands of times with and without attribution on public GitHub repositories.
Yes, it has seen code before. No, it didn't memorize the entirety of the dataset it was trained on. If it did - it has explicitly overfit, won't generalize to downstream tasks and ultimately failed at being useful in the general case.
Unfortunately, "we don't know" still, but what may have happened is that their transformer architecture creates a more efficient representation of the byte pair encoding representing the code. In doing so, it is able to learn about context, structure, and logic of the language it is trained on.
Anyways, I think this whole thing is absurd. So far - every "atrocity" I have seen committed by copilot is easily achievable with GitHub advanced search using "code contains text".