Hacker News new | ask | show | jobs
by Ecco 1343 days ago
You’re missing the point. It’s not an ego problem: if you put your code on the internet with a license you should expect people to respect the license’s rules…
3 comments

I think it's a gray area in the license. Much of the code was intended to be used freely and commercially by others, but not for AI training. It follows the license to the letter, but not the intent.

I expect we'll see new licenses appear making it clear whether or not the content can be used for training.

Who's to say the intent? I've published lots of code with very permissive licenses and I did so because I want people to be able to use that code for any reason. That's why I choose those licenses.
I think that's exactly why AI training (allowed or not) should be added to licenses.
There's nothing gray about it. The license requires attribution, and Copilot doesn't provide that attribution.
It's reading the code and generating similar code, not copying it.
Are you saying that's fair use? If so, then we won't see new licenses appear related to it, since a license can only give you more permissions on top of fair use, not take away fair use. If not, then we still won't see new licenses appear related to it, since the existing licenses already don't allow it.
Good point. I'm not a lawyer, but looking it up, the factors for fair use are:

1. the purpose and character of the use; 2. the nature of the copyrighted work; 3. the amount and substantiality of the portion used; 4. the effect of the use upon the potential market for the original work.

All of these are quite debatable, and I'll leave it to someone more familiar with the law.

Though if it's not, I believe there are licenses that allow derivative uses of code and licenses that don't. For many of these, the intention is that they create more code, but not be used to fuel AI behemoths.

Not everyone believes in intellectual property and good luck enforcing that license worldwide.
It doesn't matter what you believe. It matters what the judge and jury say when this goes to trial, and it will go to trial because Microsoft has a lot of money.
So? Most of the developed world have legal systems that does believe in intellectual property. The fact that a few people "don't believe in intellectual property" because they want to torrent movies/games is mostly irrelevant when it comes to the software engineering profession.
Expect you're not licensing functions, you're licensing a repository. If I use a sentence or even a paragraph from a copyrighted book, it's not copyright infringement.
> If I use a sentence or even a paragraph from a copyrighted book, it's not copyright infringement.

Note that this may not actually be true, and you may need to pay to license even shorter excerpts of creative work. Copyright is a complex topic. It's not always safe to assume that you have the rights you think you have, in terms of reproducing others' work.

For example: "The proportion of a total work is not the only factor, though. If you are including the most crucial aspect of a work, even if it is only a small part, then the question of “substantiality” comes into play." [1]

[1] https://www.dukeupress.edu/getmedia/3363cb6e-04b6-43ec-b004-...

It can be if you fail to give attribution. Plagiarism isn't just unethical, unprofessional and immoral (not to mention evidence that the plagiarist is an uncreative dullard). It's illegal. How many words or sentences it takes to trigger a complaint is mostly governed by what it takes to prove a violation. The more material copied, the easier that can be. In this situation providing attribution (tooltip when you mouse over the code?) would probably satisfy 9/10 of potential complaints. But big companies usually won't make that kind of minimal effort without being hit upside the metaphorical head with a piece of metaphorical lumber (like with an actual lawsuit).
If you take a function from a repository (or a sentence from a book), it is the unlicensed use of copyrighted material. Everything in the repository is covered by the license, functions, files… everything.

Whether or not it is infringement depends on if the use can be considered fair use. This is a more nuanced question and is not always clear.

In this case (Copilot) the real question is how transformative the AI training is. Given how verbatim some of the outputs are makes the argument less clear.

>If I use a sentence or even a paragraph from a copyrighted book, it's not copyright infringement.

I'm assuming you're referring to fair use. In that case whether it's copyright infringement or not is very situational (the legal standard consists of a test with various subjective factors) and isn't as simple as "it's less than a paragraph so I can copy whatever I want".

> If I use a sentence or even a paragraph from a copyrighted book, it's not copyright infringement.

It is…

then why do people put copyright and license notices on the top of every file in the repo?