|
|
|
|
|
by flowtheorist
1598 days ago
|
|
It's definitely in a gray area because the AI models are essentially compression engines that encode the code samples/data into the weights of the matrices that represent the ML model and then "uncompress" it to serve queries. I think it would be easy to argue that a compressed data set no matter how illegible would need to conform to the same license as the data set it was encoding but I don't think any lawyer is smart enough to make that case. So at the moment it remains a very convenient loophole for companies that have enough compute to mangle the data set beyond recognition and then use it to their advantage. So this will probably remain a convenient loophole for large companies to sidestep licensing restrictions by encoding whatever data/code they want to use into some neural network and then sell it as AI. For why these things are essentially mangled compression engines one can take a look at "Hopfield Networks is all you need": https://arxiv.org/abs/2008.02217. It allows representing all modern transformer networks (which is what CoPilot is using) as a bunch of hopfield networks which are essentially memory modules connected in some complicated topology to encode some data set. |
|