| How did you draw those conclusions? They don't seem to be in line with court rulings (i.e. Anthropic), which hold that training is fair use. Code is being treated the same as any other copyrighted content that is used for training, from blog posts to PR announcements from companies and everything in between. Of course the blog posts are PR announcements have their copyright held by their authors, with no license provided at all, so if OSS code being used in training is a violation, then so would everything being trained on (to a first approximation...public domain works excepted). But no court has every taken that position to my knowledge. There's just so much confusion around this. In this thread alone: * Distillation is legal under copyright; the violations would come as ToS violations, which is contract law, not copyright law. * Training is legal as well, so long as the original material was obtained legally. * Moving code off of GitHub doesn't change any of this: AI companies are free to download your git repo no matter where it is hosted, just like they can any other content on a publicly accessible website. * Liability comes into the picture when the models are used to infringe copyright in their output. We'll have to see the outcome of the NYT case here, but that is proceeding at a glacial pace. I am not a lawyer; I'm an interested amateur that's been following the saga for years. I wish the discussion here on HN were more nuanced. If anyone has legal updates that render any of the above incorrect, I'd love a pointer to the decisions. One area I'm particularly weak is the legal status in countries that are not the US: I don't follow those laws nearly as carefully, nor the court cases brought. |
C'mon, I'm not even apart of the movement to move away from GitHub, but that's not really a valid argument. Sure, they CAN download the source code, but its not nearly as automatic. They don't get to download it all, en masse, from copying hard drives/databases they already own. They have to go over the internet. They don't get automatic notifications when new code gets pushed. And finally, if one wanted, they can make it harder for bots.
I certainly believe that these companies do get away with a lot more than the average Joe - see: Facebook downloading Anna's Archive, every pirated eBook - but that doesn't mean you have to hand it to them on a silver platter.
Plus, even if your code is private on GitHub, you can guarantee that they can't train there models on it anyway; unlike if you host it yourself, or somewhere else.
Does anyone else find it ironic when closed-source GitHub claims it's some super hero for open source?