I cringe every time I see Claude trying to co-author a commit. The git history is expected to track accountability and ownership, not your Bill of Tools. Should I also co-author my PRs with my linter, intellisense and IDE?
If those tools are writing the code then in general I do expect that to be included in the PR! Through my whole career I've seen PRs where people noted that code that was generated (people have been generating code since long before LLMs). It's useful context unless you've gone over the generated code and understand it and it is the same quality as if you wrote it yourself (which in my experience is the case where it's obvious boilerplate or the generated section is small).
Needing to flag nontrivial code as generated was standard practice for my whole career.
> It's useful context unless you've gone over the generated code and understand it and it is the same quality as if you wrote it yourself
If this is not the case you should not be sending it to public repos for review at all. It is rude and insulting to expect the people maintaining these repos to review code that nobody bothered to read.
As a rule, I commit the input to the code generation tool, i.e., what the GPL refers to as "the preferred form of the work for making modifications to it", generate as part of the build process, and, where possible, try to avoid code generation tools designed around the assumption that its output will be maintained rather than regenerated from modified input.
As for LLM code assistants, I don't really view them as traditional code generation tools in the first place, as in practice they more resemble something in between autocomplete and delegating to a junior programmer.
As for attribution, I view it more or less the same way as "dictated but not read" in written correspondance, i.e., an disclaimer for errors in the code, which may be considered rude in some contexts, and a perfectly acceptable and useful annotation in others.
Usually, pre-LLM generated code is flagged because people aren't expected to modify it by hand. If you find a bug and track it to the generated code, you are expected to fix the sources and re-generate.
This is not at all the case with LLM-generated code - mostly because you can't regenerate it even if you wanted to, as it's not deterministic.
That said, I do agree that LLM code is different enough from human code (even just in regards to potential copyright worries) that it should be mentioned that LLMs were used to create it.
Compilers don't usually write the code that ends up in a PR. But compilers do (and should) generally leave behind some metadata in the end result saying what tools were used, see for example the .comment section in ELF binaries.
Do you check in binaries into your git history? If so, you should mark a commit as generated, and the commit message (plus repository state) should be enough to recreate it 1:1.
Similarly, if I use e.g. jextract or uniffi to generate Java interfaces from C code and check that in, I'll create tooling to automatically run those, and the commit will be attributed to that tooling.
Compiler versions are usually included in the package manifest. Generally you include commit info compiler version and compilation date and platform embedded in the binaries that compilers produce.
Absolutely. Let's say I have a problem with gRPC and traced it to code generated using the gRPC compiler. I can reproduce it, highlight it and I'm pretty sure the gRPC team would address the issue.
Replace gRPC compiler with LLM. Can you reproduce? (probably not 100%). Can anybody fix it short of throwing more english phrases like "DO NOT", "NEVER", "Under No Circumstances"?
There are many techniques. You're most likely to come across things like declarative DSL:s and macros, then there are things like JAXB and similar tooling that generates code from data schemas, and some people script around data sources to glue boilerplate and so on.
Arguably snippet collections belong to this genre.
I am not against the general use of AI code. Quite simply, my view is that all relevant context for a review should be disclosed in the PR.
AI and humans are not the same as authors of PRs. As an obvious example: one of the important functions of the PR process is to teach the writer about how to code in this project but LLMs fundamentally don't learn the same way as humans so there's a meaningful difference in context between humans and AIs.
If a human takes the care to really understand and assume authorship of the PR then it's not really an issue (and if they do, they could easily modify the Claude messages to remove "generated by Claude" notes manually) but instead it seems that Claude is just hiding relevant context from the reviewer. PRs without relevant context are always frustrating.
What's really tricky with the legal protections area is this: 90% of the value of the S&P 500 is intangible. Meaning if you suck out the book value (10%), the rest is brand, IP, rights, sources & methods, etc. So if a company can't protect that, it's not particularly valuable anymore. Maybe we will see a shift back to tangible assets and book value (25,000 $8MM Vera Rubin machines) and away from intangibles...
I think this is just the beginning so people are apprehensive, rightfully so, at this stage. I agree with you that AI use should be disclosed but using the commit message as a billboard for Anthropic hell no. Go put an add on the free tier.
You don't generally commit compiled code to your VCS. If you do need to commit a binary for whatever reason, yeah it makes sense to explain how the binary was generated.
A whole lot of people find LLM code to be strictly objectionable, for a variety of reasons. We can debate the validity of those reasons, but I think that even if those reasons were all invalid, it would still be unethical to deceive people by a deliberate lie of omission. I don't turn it off, and I don't think other people should either.
For the purpose of disclosure, it should say “Warning: AI generated code” in the commit message, not an advertisement for a specific product. You would never accept any of your other tools injecting themselves into a commit message like that.
My tools just don't add such comments. I don't know why I would care to add that information. I want my commits to be what and why, not what editor someone used. It seems like cruft to me. Why would I add noise to my data to cater to someone's neuroticism?
At least at my workplace though, it's just assumed now that you are using the tools.
What editor you are using has no effect on things like copyright, while software that synthesises code might.
In commercial settings you are often required to label your produce and inform about things like 'Made in China' or possible adverse effects of consumption.
well if I know a specific LLM has certain tendencies (eg. some model is likely to introduce off-by-one errors), I would know what to look for in code-review
I mean, of course I would read most of the code during review, but as a human, I often skip things by mistake
If a whole of people thought that running code through a linter or formatter was objectionable, I'd probably just dismiss their beliefs as invalid rather than adding the linter or formatter as a co-author to every commit.
Linters and formatters are different tools then LLMs. There is a general understanding that linters and formatters don’t alter the behavior of your program. And even still most projects require a particular linter and a formatter to pass before a PR is accepted, and will flag a PR as part of the CI pipeline if a particular linter or a particular formatter fails on the code you wrote. This particular linter and formatter is very likely to be mentioned somewhere in the configuration or at least in the README of the project.
Like frying a veggie burger in bacon grease. Just because somebody's beliefs are dumb doesn't mean we should be deliberately tricking them. If they want to opt out of your code, let them.
Never fried one in bacon grease, but they are good with bacon and cheese. I have had more than one restaurant point out that their bacon wasn't vegetarian when ordering, though.
I've heard similar things before. Frying a veggie burger in bacon grease to sneakily feed someone meat/meat-byproducts who does not want to eat it, like a vegan or a person following certain religious observances. As in, it's not ok to do this even if you think their beliefs are stupid.
In my view, vegans are dumb but it's still unethical to trick them into eating something they ordinarily wouldn't. Does that make sense to you? I am not asking you to agree with me on the merits of veganism, I am explaining why the merits of veganism shouldn't even matter when it comes to the question of deliberately trying to trick them.
Can you see a world where everyone has an AI Persona based on their prior work that acts like a RAG to inform how things should be coded? Meaning this is patent qualified code because, despite being AI configured, it is based on my history of coding?
Likewise. I don’t mind that people use LLMs to generate text and code. But I want any LLM generated stuff to be clearly marked as such. It seems dishonest and cheap to get Claude to write something and then pretend you did all the work yourself.
You can disclose that you used an LLM in the process of writing code in other ways, though. You can just tell people, you can mention it in the PR, you can mention it in a ticket, etc.
+1. If we’re at an early stage in the agentic curve where we think reading commit messages is going to matter, I don’t want those cluttered with meaningless boilerplate (“co-authored by my tools!”).
But at this point i am more curious if git will continue to be the best tool.
I'm only beginning to use "agentic" LLM tools atm because we finally gained access to them at work, and the rest of my team seems really excited about using them.
But for me at least, a tool like Git seems pretty essential for inspecting changes and deciding which to keep, which to reroll, and which to rewrite. (I'm not particularly attached to Git but an interface like Magit and a nice CLI for inspecting and manipulating history seem important to me.)
What are you imagining VCS software doing differently that might play nicer with LLM agents?
Check out Mitchell Hashimoto’s podcast episode on the pragmatic engineer. He starts talking about AI at 1:16:41. At some point after that he discusses git specifically, and how in some cases it becomes impossible to push because the local branch is always out of date.
So if I use Claude to write the first pass at the code, make a few changes myself, ask it to make an additional change, change another thing myself, then commit it — what exactly do you expect to see then?
I don't think so. The tag doesn't just say "this was written by an LLM". It says which LLM - which model - authored it. As LLMs get more mature, I expect this information will have all sorts of uses.
It'll also become more important to know what code was actually written by humans.
If you accept the code generated by them nearly verbatim, absolutely.
I don't understand why people consider Claude-generated code to be their own. You authored the prompts, not the code. Somehow this was never a problem with pre-LLM codegen tools, like macro expanders, IPC glue, or type bundle generators. I don't recall anybody desperately removing the "auto-generated do not edit" comments those tools would nearly always slap at the top of each file or taking offense when someone called that code auto-generated. Back in the day we even used to publish the "real" human-written source for those, along with build scripts!
It's weird, because they should not consider it as their own, but they should take accountability from it.
Ideally, if I contribute to any codebase, what needs to be judged is the resulting code. Is it up to the project's standards ? Does the maintainer have design objections ?
What tool you use shouldn't matter, be it your IDE or your LLM.
But that also means you should be accountable for it, you shouldn't defend behind "But Claude did this poorly, not me !", I don't care (in a friendly way), just fix the code if you want to contribute.
The big caveat to this is not wanting AI-Generated code for ideological reasons, and well, if you want that you can make your contributors swear they wrote it by themselves in the PR text or whatever.
I'm not really sure how to feel about this, but I stand by my "the code is what matters" line.
Some differences with the human source for those kinds of tools: (1) the resultant generated code was deterministic (2) it was usually possible to get access to the exact version of the tool that generated it
Since AI tools are constantly obsoleted, generate different output each run, and it is often impossible to run them locally, the input prompts are somewhat useless for everyone but the initial user.
Well is it actually being used as a tool where the author has full knowledge and mental grasp of what is being checked in, or has the person invoked the AI and ceded thought and judgment to the AI? I.e., I think in many cases the AI really is the author, or at least co-author. I want to know that for attribution and understanding what went into the commit. (I agree with you if it's just a tool.)
Pre-LLM, it was much easier for reviewers to discern that. Now, the AI-generated code can look like it was well thought out by somebody competent, when it wasn't.
Have you ever reviewed an AI-generated commit from someone with insufficient competence that was more compelling than their work would be if it was done unassisted? In my experience it’s exactly the opposite. AI-generation aggravates existing blindspots. This is because, excluding malicious incompetence, devs will generally try to understand what they’re doing if they’re doing it without AI
I have. It's always more compelling in a web diff. These guys are the first coworkers for which it became absolutely necessary for me to review their work by pulling down all their code and inspecting every line myself in the context of the full codebase.
I try to understand what the llm is doing when it generates code. I understand that I'm still responsible for the code I commit even if it's llm generated so I may as well own it.
Yes, it sets the reviewer's expectations around how much effort was spent reviewing the code before it was sent.
I regularly have tool-generated commits. I send them out with a reference to the tool, what the process is, how much it's been reviewed and what the expectation is of the reviewer.
Otherwise, they all assume "human authored" and "human sponsored". Reviewers will then send comments (instead of proposing the fix themselves). When you're wrangling several hundred changes, that becomes unworkable.
Tools do author commits in my code bases, for example during a release pipeline. If I had commits being made by Claude I would expect that to be recorded too. It isn't for recording a bill of tools, just to help understand a projects evolution.
>legally speaking.. if you're not sure of the risk- you don't document it.
Ah, so you kinda maybe sorta absolve yourself of culpability (but not really — "I didn't know this was copyrighted material" didn't grant you copyright), and simultaneously make fixing the potentially compromised codebase (someone else's job, hopefully) 100x harder because the history of which bits might've been copied was never kept.
Solid advice! As ethical as it is practical.
By the same measure, junkyards should avoid keeping receipts on the off chance that the catalytic converters some randos bring in after midnight are stolen property.
Better not document it.
One little trick the legal folks don't want you to know!
Yea in my Claude workflow, I still make all the commits myself.
This is also useful for keeping your prompts commit-sized, which in my experience gives much better results than just letting it spin or attempting to one-shot large features.
No, because those things don't change the logical underpinnings of the code itself. LLM-written code does act in ways different enough from a human contributor that it's worth flagging for the reviewer.
Huh? Unless the sole purpose of the commit was to lint code, it would be unnecessary fluff to append the name of the automatically linted tools that ran in a pre-commit hook in every commit.
In some jurisdictions (e.g. the UK) the law is already clear that you own the copyright. In the US it is almost certain that you will be the author. The reports of cases saying otherwise I have been misreported - the courts found the AI could not own the copyright.
Thaler v. Perlmutter: The D.C. Circuit Court affirmed in March 2025 that the Copyright Act requires works to be authored "in the first instance by a human being," a ruling the Supreme Court left intact by declining to hear the case in 2026.
Authors and inventors, courts have ruled, means people. Only people. A monkey taking a selfie with your camera doesn't mean you own a copyright. An AI generating code with your computer is likewise, devoid of any copyright protection.
The ruling says that the LLM cannot be the author. It does not say that the human being using the LLM cannot be the author. The ruling was very clear that it did not address whether a human being was the copyright holder because Thaler waived that argument.
the position with a monkey using your camera is similar, and you may or may not hold the copyright depending on what you did - was it pure accident or did you set things up. Opinions on the well known case are mixed: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
Where wildlife photographers deliberately set up a shot to be triggered automatically (e.g. by a bird flying through the focus) they do hold the copyright.
AI generated code has no copyright. And if it DID somehow have copyright, it wouldn't be yours. It would belong to the code it was "trained" on. The code it algorithmically copied. You're trying to have your cake, and eat it too. You could maybe claim your prompts are copyrighted, but that's not what leaked. The AI generated code leaked.
The linked document labeled "Part 2: Copyrightability", section V. "Conclusions" states the following:
> the Copyright Office
concludes that existing legal doctrines are adequate and appropriate to resolve questions of
copyrightability. Copyright law has long adapted to new technology and can enable case-by-
case determinations as to whether AI-generated outputs reflect sufficient human contribution to
warrant copyright protection. As described above, in many circumstances these outputs will be
copyrightable in whole or in part—where AI is used as a tool, and where a human has been able
to determine the expressive elements they contain. Prompts alone, however, at this stage are
unlikely to satisfy those requirements.
So the TL;DR basically implies pure slop within the current guidelines outlined in conclusions is NOT copyrightable. However collaboration with an AI copyrightability is determined on a case by case basis. I will preface this all with the standard IANAL, I could be wrong etc, but with the concluding language using "unlikely" copyrightable for slop it sounds less cut and dry than you imply.
It's beyond obvious that a LLM cannot have copyright, any more than a cat or a rock can. The question is whether anyone has or if whatever content generated by a LLM simply does not constitute a work and is thus outside the entire copyright law. As far as I can see, it depends on the extent of the user's creative effort in controlling the LLM's output.
It may be obvious to you, but it has lead to at least one protracted court case in the US: Thaler v. Perlmutter.
> The question is whether anyone has or if whatever content generated by a LLM simply does not constitute a work and is thus outside the entire copyright law.
Its is going to vary with copyright law. In the UK the question of computer generated works is addressed by copyright law and the answer is "the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken"
Its also not a simple case of LLM generated vs human authored. How much work did the human do? What creative input was there? How detailed were the prompts?
In jurisdictions where there are doubts about the question, I think code is a tricky one. If the argument that prompts are just instructions to generate code, therefore the code is not covered by copyright, then you could also argue that code is instructions to a compiler to generate code and the resulting binary is not covered by copyright.
The binary should be considered "derived work". Only the original copyright owner has the exclusive right to create or authorize derivative works. Means you are not allowed to compile code unless you have the license to do so. Right?
Yes, so is LLM generated code a derivative work of the prompts? Does it matter how detailed the prompts are? How much the code conforms to what is already written (e.g. writing tests)?
It looks like it will be decided on a case by case basis.
It will also differ between countries, so if you are distributing software internationally what will be a constraint on treating the code as not copyrightable.
It is not "beyond obvious" that a cat cannot have copyright, given the lawsuit about a monkey holding copyright [1], and the way PETA tried to used that case as precedent to establish that any animal can hold copyright.
Anthropic could at least make a compelling case for the copyright.
It becomes legally challenging with regards to ownership if I ever use work equipment for a personal project. If it later takes off they could very well try to claim ownership in its entirety simply because I ran a test once (yes, there's a while silicon valley season for it).
I don't know if they'd win, but Anthropic absolutely would be able to claim the creation of that code was done on their hardware. Obviously we aren't employees of theirs, though we are customers that very likely never read what we agreed to in a signup flow.
Using work equipment for a personal project only matters because you signed a contract giving all of your IP to your employer for anything you did with (or sometimes without) your employer's equipment.
Anthropic's user agreement does not have a similar agreement.
My point was that they could make a compelling case though, not that they would win.
I don't know of ant precedent where the code was literally generated on someone else's system. Its an open question whether that implies any legal right to the work and I could pretty easily see a court accepting the case.
Who owns the copyright for something not written by anybody, you ask? Is it the man who pays to have it written, or the owner of the machine that does the writing? But it is neither. Nobody owns the copyright because nobody has written it.
Needing to flag nontrivial code as generated was standard practice for my whole career.