Hacker News new | ask | show | jobs
by polskibus 371 days ago
What is your team’s take on the copyright for commits generated by ai agent ? Would the copyright protect it?

Current US stance seems to be: https://www.copyright.gov/newsnet/2025/1060.html “It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements”.

If entire commit is generated by AI then it is obvious what created it - it’s AI. Such commit might not be covered by the law. Is this something your team has already analysed?

6 comments

This is a very fascinating aspect which is not discussed much. So far in human history every text was written by someone and thus there is some kind of copyright.

Now we have text which is legally not owned by anybody. Is it "public domain" though? It is not possible to verify it, so maybe it is but it still poses legal risks.

>If entire commit is generated by AI then it is obvious what created it - it’s AI.

Whether it's committed or not is irrelevant to the conclusion there, the question is what was the input.

For something like a compiler where the output is mostly deterministic[0] I agree. For an AI that was trained on an unknown corpus, and that corpus changes over time, the output is much less deterministic and I would say you lose the human element needed of copyright claims.

If it can be shown that for the same prompt, run through the AI several times over perhaps a year, results in the same output - then I will change my mind. Or if the AI achieves personhood.

[0] Allowances for register & loop optimization, etc.

> “It concludes that the outputs of generative AI can be protected by copyright only where a human author has determined sufficient expressive elements”

How would that work if it's a patch to a project with a copyleft license like GPL which requires all derivate work to be licensed the same?

IANAL, but it means the commit itself is public domain. When integrated into a code base with a more restrictive license, you can still use that isolated snippet in whatever way you want.

More interesting question is whether one could remove the GPL restrictions on public code by telling AI to rewrite the code from scratch, providing only the behavior of the code.

This could be accomplished by making AI generate a comprehensive test suite first, and then let it write the code of the app seeing only the test suite.

Hmm, so basically automated clean room reimplementation, using coding agents? Our concepts of authorship, copying, and equivalence are getting a real workout these days!
you'd need a pretty good opsec and non-search capable agent and logs of all its actions/chain of thought/process to be able to truly claim cleanroom implementation tho
The logs and traceability are the secret sauce here. It's one thing to have an artifact that mysteriously replicates the functionality of a well known IP-protected product without just straight up copying it. It's another thing to be able to demonstrate that said artifact was generated solely from information in the public domain or otherwise legally valid to use.
if its of your interest, i was investigating this and found out all the big labs like openai offer and indemnity clause for enterprise customers, that is supposed to assure you that it doesn't output non-compliant license code (like copyrighted or AGPL or whatever), BUT you have to accept them keeping all your logs, give them access, and let them and their lawyers do build their own case in case of getting sued.

I guess they're mostly selling insurance to bigCo's, and saying, hey we have the money to go to law, and the interests to win such a case, so we'll handle it

GPL is a copyright licence, not a ToS.
> GPL is a copyright licence, not a ToS.

How is ToS relevant to this thread?

AI Code and Copyright - Risky Business or Creative Power-Up(AI Generated Podcast)

https://open.spotify.com/episode/6o2Ik3w6c4x4DYILXwRSos?si=5...

An unconventional license for AI-generated code. Maybe public domain, maybe not. Use freely, vibe responsibly.

https://jilvin.github.io/vibe-license/

> If entire commit is generated by AI then it is obvious what created it - it’s AI.

This is not the case. The output of a compiler is 100% created by a compiler too. Copyright is based on where the creative aspect comes from.

I have had very little luck having 2025-era AIs manage the creative aspects of coding -- design, architecture, and similar -- and that's doubly true for what appears to be the relatively simplistic model in codex (as far as I can tell, codex trades off model complexity for model time; the model does a massive amount of work for a relatively small change).

However, it is much better than I am at the mechanical aspects. LLMs can fix mechanical bugs almost instantly (the sort of thing with a cut-and-paste fix in some build process from Stack Overflow), and generate massive amounts of code without typos or shallow bugs.

A good analogy is working with powertools versus handtools. I can do much more in one step, but I'm still in creative control.

The codebase I'm working on is pretty sophisticated, and I might imagine they could implement more cookiecutter things (e.g. a standard oauth workflow) more automatically.

However, even there -- or in discussions with larger models about my existing codebase -- what they do is in part based their creativity on human contributions to their training set. I'm not sure how to weigh that. An LLM oauth workflow might be considered the creative median of a lot of human-written code.

I write a lot of AGPL code, and at least in the 3.5 era, they were clearly trained on my code, and would happily print it out more-or-less verbatim. Indeed, it was to the point where I complained to OpenAI about it at the time, but never got a response. I suspect a lot of generated code will include some fractional contribution from me now (an infinitesimal fraction most of the time, but more substantial for niche code similar to my codebase).

So in generated code, we have a mixture of at least a few different pieces:

- User's contributions, in prompt, review, etc.

- Machine contributions

- Training set contributions