Hacker News new | ask | show | jobs
by losvedir 1804 days ago
I understand some of the legal implications to be regurgitating licensed code verbatim. But, what about this: what if the current Copilot is working out the kinks, and the real product is per-organization models with transfer learning using their repos' code?

At work, we store all our code in our GitHub repo, some public, some private. As-is, I think there's a lot of legal ambiguity around using Copilot, but if all that code just served to teach the model structure of programs and common syntactical constructs, but then it had another layer with our code and its idioms, modules, names, then maybe it would regurgitate our code in a way that's useful and doesn't run afoul of licenses.

I'm thinking of a fast.ai course I did where I took a base model trained on generic image data, and then did transfer learning on top where I fed it labeled images of Go games and Chess games, and with only maybe 100 of those images it learned to distinguish the two with shocking accuracy. As I understand it, the base model taught it how to look for things like lines, corners, contrast, etc, and then it could be easily specialized. Could something similar be the case here?

1 comments

Yes, I think so too. Only not in the near future, as we don't really have enough computing power to make that feasible. I've written a few thoughts about this here https://vladiliescu.net/github-copilot-first-impressions/#po...