|
|
|
|
|
by yangff
1812 days ago
|
|
So.. I can see that this ML model is generating some code exactly same as the original dataset, which definiately a problem. A defect model, sure.
Beside that, I cannot understand why the overall idea, using open-source project to train a ML model that generates code would ever be a problem. We human beings are learning as the model, we read others code, books, articles, design patterns... and it becomes part of us. Even the private code, I mean like you join a company, you read their codebase, methodology and it becomes something yours. Copyrights generally not allow you to "copy" the original, but you can still synthesize your own code -- cutting, combination, creating based on whatever you have learnt.
The method of how a ML model works is differ from human brain for sure, but I cannot see why this would be a problem, or why an organic would become something superior that what they do is a creation and a ML mode is scraping your code. What is the difference here???? And also recently we saw GPT that generates articles, waifulabs that generates ... waifus... to be honest I cannot perceive the difference since all of them are "learning" (in a mechanical way of human created knowledge. |
|
I'm really waiting for this to blow up from the open source license angle. Freely combining code with different license is a hellish undertaking on its own. But already just re-using some, say, GPL code, even staying under the same license, but without proper attribution, is Forbidden with capital F.