Hacker News new | ask | show | jobs
by heavyset_go 103 days ago
> If you sign off the code and put your expertise and reputation behind it, AI becomes just an advanced autocomplete tool and, as such, should not count in “no AI” rules.

No, it's not that simple. AI generated code isn't owned by anyone, it can't be copyrighted, so it cannot be licensed.

This matters for open source projects that care about licensing. It should also matter for proprietary code bases, as anyone can copy and distribute "their" AI generated code for any purpose, including to compete with the "owner".

3 comments

Care to explain? I see that statement in this thread, but I am not sure where this is grounded in fact.

This is very interesting, because there must be a line here that AI is crossing, and the line is not clearly determined yet.

Is linting code crossing the line?

Is re-factoring code with automated tools like bicycle repair man crossing the line ?

Is AI doing a code review and suggesting the code crossing the line ?

Is writing code with a specific prompt and sample code crossing the line?

Is producing a high level spec and let the AI design details and code the whole thing crossing the line ?

So, where exactly is this line ?

The next interesting question is how this could even be enforced. It's going to be hard to prove AI use when using strictly local models. Maybe they could embed some watermark like thing, but I am not sure this can't be circumvented.

Would really like to see some legal opinions on this ( unlikely to happen :)

The best I found is here: https://copyrightlately.com/thaler-is-dead-ai-copyright-ques...

Here's what a Red Hat/IBM IP lawyer said about the chardet situation: https://github.com/chardet/chardet/issues/334#issuecomment-4...

Here's what the US Copyright Office says: https://newsroom.loc.gov/news/copyright-office-releases-part...

Yeah, that's what the link I posted also discusses (but then goes into much detail, but then offers no actual resolution).

I guess we will have to wait for cases to be brought and resolved at the courts. Not a great recipe to be the leader in AI, it must be said.

An updated copyright bill from legislature, or even positive regulatory action from the executive branch would speed things up and give much planning certainty to actors here in the US.

The rest of the world won't be waiting though -- maybe Europe, but Europe sadly doesn't really matter that much anymore :(

> No, it's not that simple. AI generated code isn't owned by anyone, it can't be copyrighted, so it cannot be licensed.

There is no way to reliably identify code as AI-generated, unless it is explicitly labelled so. Good code produced by AI is not different from the good code produced by software engineer, so copyright is the last thing I would be worried about. Especially given the fact that reviewing all pull requests is substantial curation work on the side of maintainers: even if submitted code is not copyrightable, the final product is.

At least with LLM providers, they have your prompts and output, and if they wanted to, they could identify what code was AI generated or not.

Maybe they can be subpoenaed, maybe they can sell the data to parties who care like legal teams, maybe they can make it service anyone can plug a GitHub repo into, etc.

Jokes on you - I run LLMs only locally, and besides the most widely deployed code generating tool AFAIR is JetBrain tiny ~200M LLM, builtin into their IDE.
Do you really think anyone is ready to spend money on legal to prove that some piece of code is public domain/has no author? That’s an expensive bet with uncertain outcome. And of course you can recover some information only if logs exist, which might not be the case, especially if local inference was used.
> AI generated code isn't owned by anyone, it can't be copyrighted, so it cannot be licensed.

Translation: AI generated code is in the public domain in the US (until and unless something changes).

You can freely incorporate public domain code into any other codebase. You can relicense it as you see fit. Public domain material is not viral the way the GPL is.

Furthermore, if you make changes to public domain code the derivative product is subject to copyright.

You can "relicense" it as you see fit, but anyone can just copy it and ignore your license and its terms entirely, it's not your property to put a license on.

See also Red Hat IP lawyer's opinion on trying to license the chardet "rewrite": https://github.com/chardet/chardet/issues/334#issuecomment-4...

From a copyright office report [0] linked in that issue section:

> in many circumstances these outputs will be copyrightable in whole or in part—where AI is used as a tool, and where a human has been able to determine the expressive elements they contain. Prompts alone, however, at this stage are unlikely to satisfy those requirements

I'll just note that "unlikely" doesn't mean "not possible".

It's an interesting question but it's perplexing to me that an IP lawyer of all people would be suggesting to remove a license that might or might not apply in practice. Removing it could have legal consequences if it turns out to be copyrightable. Whereas leaving the notice there wouldn't be expected to have any consequences in the event that it turns out not to apply. Particularly the notice for a permissive OSI license.

To begin with the project is MIT licensed which amounts to public domain with extra steps. So it's the same thing at the end of the day regardless of how the law turns out.

On top of that, even if we suppose the LLM output to be public domain as soon as you later make modifications to the file the license in the header would come into force. It's functionally no different than copy-pasting a piece of public domain code and then editing it a bit.

[0] https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

Big tech employees better be quick then!