|
|
|
|
|
by fire
1216 days ago
|
|
I don't understand why they aren't tagging data with license information and allowing users to use models that don't include certain licenses - seems like it would be the middle ground given the stance they've taken; like, "we don't think it's a problem, but if this makes you feel better you can use these other models that specifically don't train on gpl code, or whatever" I would prefer to see full license attributions included in generated responses, though. Something that then also wouldn't be that difficult to generate a licenses file from? Amazon's CodeWhisperer has a "reference tracker" that tells you the license of training data code if the generated response is within some similarity threshold, but that's still not good enough imo. |
|
Exactly. By all means build tools like this, but build them to actually comply with Open Source licenses. Provide a list of the licenses you don't mind copying from, and get back attributions with your suggestions.