| Part of how AI works is that it's just really complicated compression, you can get AI to write out Harry Potter novels word for word with the right prompting. When it picks out a rare bit of code, it will be simply copying that code, illegally, and presenting it without attribution or any licenses which is in fact breaking the law but AI companies are too important for the law to apply to them. There's been instances where models have spat out comments in code that mention original authors, etc., effectively outing itself as a copyright thief. There's nothing anyone can do about it, but the suspicion is that the big companies have taken everyone's code on GitHub, without consent, and trained on it. And now are spitting out big chunks of copyrighted code and presented it as somehow transformed even though all they've actually done is change a few variable names. It is copyright theft, but because programmers are little people, not Disney, we don't have any recourse. |
It's pretty likely that I've done the same thing. I mean, I've written enough CRUD functions in my life, for example, that in all likelihood I'm regurgitating stuff that's a copy, for all practical purposes, of stuff I've done before as work-for-hire for my employer. I'm not stealing intentionally or consciously, but it seems quite likely that it's happening. And that's probably true for many of you, at least that have been in the industry for a while.