Hacker News new | ask | show | jobs
by svaha1728 1807 days ago
So if a foreign company pilfers the source code to Windows, can they add it to a training set and then 'prompt' the machine learning algorithm to spit out a new 'copyright free' Windows, just by transforming the variable names?
2 comments

I think that's my question regarding this whole thing:

If it's so fair use, why not train it on all Microsoft code, regardless of license (in addition to GitHub.com) ? Would Microsoft employees be fine with Copilot re-creating "from memory" portions of Windows to use in WINE ?

Well no, because only GitHub has access to the training set. But more importantly this misunderstands how Copilot even works -- even if Windows was in the training set, you couldn't get Copilot to reproduce it. It only generates a few lines of code at a time, and even then it's almost certainly entirely novel code.

Now, if you knew the code you wanted Copilot to generate you could certainly type it character by character and you might save yourself a few keystrokes with the TAB key, but it's going to be much MUCH easier to simply copy the whole codebase as files, and now you're right back where you started.

GPT-3 is still Microsoft licensed, but a similar model can be put together with the freely available GPT-2 and source code -- especially if your intent is copyright transfer.

As Francois Chollet points out in this talk, ultimately deep neural network models are locally sensitive hash tables, so the examples of people pulling out source code is an inherent shortcoming of deep learning models in general. Give the right 'key' and you can 'recall' the value you are looking for.

https://www.youtube.com/watch?v=J0p_thJJnoo