Hacker News new | ask | show | jobs
by manbitesdog 73 days ago
I cringe every time I see Claude trying to co-author a commit. The git history is expected to track accountability and ownership, not your Bill of Tools. Should I also co-author my PRs with my linter, intellisense and IDE?
20 comments

If those tools are writing the code then in general I do expect that to be included in the PR! Through my whole career I've seen PRs where people noted that code that was generated (people have been generating code since long before LLMs). It's useful context unless you've gone over the generated code and understand it and it is the same quality as if you wrote it yourself (which in my experience is the case where it's obvious boilerplate or the generated section is small).

Needing to flag nontrivial code as generated was standard practice for my whole career.

> It's useful context unless you've gone over the generated code and understand it and it is the same quality as if you wrote it yourself

If this is not the case you should not be sending it to public repos for review at all. It is rude and insulting to expect the people maintaining these repos to review code that nobody bothered to read.

Sometimes code generation is a useful tool, and maybe people have read and reviewed the generator.

The difference here is that the generator is a non-deterministic LLM and you can't reason about its output the same way.

As a rule, I commit the input to the code generation tool, i.e., what the GPL refers to as "the preferred form of the work for making modifications to it", generate as part of the build process, and, where possible, try to avoid code generation tools designed around the assumption that its output will be maintained rather than regenerated from modified input.

As for LLM code assistants, I don't really view them as traditional code generation tools in the first place, as in practice they more resemble something in between autocomplete and delegating to a junior programmer.

As for attribution, I view it more or less the same way as "dictated but not read" in written correspondance, i.e., an disclaimer for errors in the code, which may be considered rude in some contexts, and a perfectly acceptable and useful annotation in others.

"Here's what AI came up with and it mostly worked the one time I tested it. Might need improving".

No. I don't want to test and pick through your shitty LLM generated code. If I wanted the entire code base to be junk, it'd say so in the readme.

Usually, pre-LLM generated code is flagged because people aren't expected to modify it by hand. If you find a bug and track it to the generated code, you are expected to fix the sources and re-generate.

This is not at all the case with LLM-generated code - mostly because you can't regenerate it even if you wanted to, as it's not deterministic.

That said, I do agree that LLM code is different enough from human code (even just in regards to potential copyright worries) that it should be mentioned that LLMs were used to create it.

> If those tools are writing the code then in general I do expect that to be included in the PR!

How about compiler?

Compilers don't usually write the code that ends up in a PR. But compilers do (and should) generally leave behind some metadata in the end result saying what tools were used, see for example the .comment section in ELF binaries.
Are you checking in compiled artifacts? Then yeah, we should have a chain of where that binary blob came from.
Do you check in binaries into your git history? If so, you should mark a commit as generated, and the commit message (plus repository state) should be enough to recreate it 1:1.

Similarly, if I use e.g. jextract or uniffi to generate Java interfaces from C code and check that in, I'll create tooling to automatically run those, and the commit will be attributed to that tooling.

Compiler versions are usually included in the package manifest. Generally you include commit info compiler version and compilation date and platform embedded in the binaries that compilers produce.
Absolutely. Let's say I have a problem with gRPC and traced it to code generated using the gRPC compiler. I can reproduce it, highlight it and I'm pretty sure the gRPC team would address the issue.

Replace gRPC compiler with LLM. Can you reproduce? (probably not 100%). Can anybody fix it short of throwing more english phrases like "DO NOT", "NEVER", "Under No Circumstances"?

Probably not.

>It's useful context unless you've gone over the generated code and understand it and it is the same quality as if you wrote it yourself

I thought the argument was that AI-users were reviewing and understanding all of the code?

> people have been generating code since long before LLMs

How? LSTM?

There are many techniques. You're most likely to come across things like declarative DSL:s and macros, then there are things like JAXB and similar tooling that generates code from data schemas, and some people script around data sources to glue boilerplate and so on.

Arguably snippet collections belong to this genre.

For example `rails generate ...` built into the Rails CLI.
See, for example, this blog post from 2014: https://go.dev/blog/generate

The following comment in the blog post

    //go:generate stringer -type=Pill
generates a .._string.go file which contains a '.String()' method.

I would find it very reasonable to commit that with 'Co-Authored-By: stringer v0.1.0' or such.

Or 'sed s/a/b/g' and 'Co-Authored-By: sed'

Holy shit I’m old.
You assemble all your machine code using a magnetized needle?
I am not against the general use of AI code. Quite simply, my view is that all relevant context for a review should be disclosed in the PR.

AI and humans are not the same as authors of PRs. As an obvious example: one of the important functions of the PR process is to teach the writer about how to code in this project but LLMs fundamentally don't learn the same way as humans so there's a meaningful difference in context between humans and AIs.

If a human takes the care to really understand and assume authorship of the PR then it's not really an issue (and if they do, they could easily modify the Claude messages to remove "generated by Claude" notes manually) but instead it seems that Claude is just hiding relevant context from the reviewer. PRs without relevant context are always frustrating.

What's really tricky with the legal protections area is this: 90% of the value of the S&P 500 is intangible. Meaning if you suck out the book value (10%), the rest is brand, IP, rights, sources & methods, etc. So if a company can't protect that, it's not particularly valuable anymore. Maybe we will see a shift back to tangible assets and book value (25,000 $8MM Vera Rubin machines) and away from intangibles...
I think this is just the beginning so people are apprehensive, rightfully so, at this stage. I agree with you that AI use should be disclosed but using the commit message as a billboard for Anthropic hell no. Go put an add on the free tier.
You don't generally commit compiled code to your VCS. If you do need to commit a binary for whatever reason, yeah it makes sense to explain how the binary was generated.
You do usually pin your compiler version though, or at the very least set a minimum version
Don't be silly.

I use good ol' C-x M-c M-butterfly.

https://xkcd.com/378/

Sometimes using AI to code feels closer to a Butterfly than emacs right?
A whole lot of people find LLM code to be strictly objectionable, for a variety of reasons. We can debate the validity of those reasons, but I think that even if those reasons were all invalid, it would still be unethical to deceive people by a deliberate lie of omission. I don't turn it off, and I don't think other people should either.
For the purpose of disclosure, it should say “Warning: AI generated code” in the commit message, not an advertisement for a specific product. You would never accept any of your other tools injecting themselves into a commit message like that.
My last commit is literally authored by dependabot.
well you know 100% know what dependabot does
Leaves you open to vulnerabilities in overnight builds of NPM packages that increasingly happen due to LLM slop?
You can set a minimum age for packages (https://docs.github.com/en/code-security/reference/supply-ch...), though that's not perfect (and becomes less effective if everyone uses it).
But how much AI-generated code? If it's just a smallish function or two while most iof the code was written by hand?
My tools just don't add such comments. I don't know why I would care to add that information. I want my commits to be what and why, not what editor someone used. It seems like cruft to me. Why would I add noise to my data to cater to someone's neuroticism?

At least at my workplace though, it's just assumed now that you are using the tools.

What editor you are using has no effect on things like copyright, while software that synthesises code might.

In commercial settings you are often required to label your produce and inform about things like 'Made in China' or possible adverse effects of consumption.

well if I know a specific LLM has certain tendencies (eg. some model is likely to introduce off-by-one errors), I would know what to look for in code-review

I mean, of course I would read most of the code during review, but as a human, I often skip things by mistake

Tbh as long as the PR looks good, its good to go for internal testing.
If a whole of people thought that running code through a linter or formatter was objectionable, I'd probably just dismiss their beliefs as invalid rather than adding the linter or formatter as a co-author to every commit.
A linter or a formatter does not open you up to compliance and copyright issues.
Linters and formatters are different tools then LLMs. There is a general understanding that linters and formatters don’t alter the behavior of your program. And even still most projects require a particular linter and a formatter to pass before a PR is accepted, and will flag a PR as part of the CI pipeline if a particular linter or a particular formatter fails on the code you wrote. This particular linter and formatter is very likely to be mentioned somewhere in the configuration or at least in the README of the project.
Like frying a veggie burger in bacon grease. Just because somebody's beliefs are dumb doesn't mean we should be deliberately tricking them. If they want to opt out of your code, let them.
> frying a veggie burger in bacon grease

hmm gotta try that

I love black bean burgers (bongo burger near Berkeley is my classic), sounds like an interesting twist
Never fried one in bacon grease, but they are good with bacon and cheese. I have had more than one restaurant point out that their bacon wasn't vegetarian when ordering, though.
In your view, those who prefer veggie burgers are dumb. Am I misinterpreting?
I've heard similar things before. Frying a veggie burger in bacon grease to sneakily feed someone meat/meat-byproducts who does not want to eat it, like a vegan or a person following certain religious observances. As in, it's not ok to do this even if you think their beliefs are stupid.
In my view, vegans are dumb but it's still unethical to trick them into eating something they ordinarily wouldn't. Does that make sense to you? I am not asking you to agree with me on the merits of veganism, I am explaining why the merits of veganism shouldn't even matter when it comes to the question of deliberately trying to trick them.
Can you see a world where everyone has an AI Persona based on their prior work that acts like a RAG to inform how things should be coded? Meaning this is patent qualified code because, despite being AI configured, it is based on my history of coding?
Likewise. I don’t mind that people use LLMs to generate text and code. But I want any LLM generated stuff to be clearly marked as such. It seems dishonest and cheap to get Claude to write something and then pretend you did all the work yourself.
The reason I want it to be marked as such is because I review AI code differently than human code - it just makes different kinds of mistakes.
You can disclose that you used an LLM in the process of writing code in other ways, though. You can just tell people, you can mention it in the PR, you can mention it in a ticket, etc.
+1. If we’re at an early stage in the agentic curve where we think reading commit messages is going to matter, I don’t want those cluttered with meaningless boilerplate (“co-authored by my tools!”).

But at this point i am more curious if git will continue to be the best tool.

I'm only beginning to use "agentic" LLM tools atm because we finally gained access to them at work, and the rest of my team seems really excited about using them.

But for me at least, a tool like Git seems pretty essential for inspecting changes and deciding which to keep, which to reroll, and which to rewrite. (I'm not particularly attached to Git but an interface like Magit and a nice CLI for inspecting and manipulating history seem important to me.)

What are you imagining VCS software doing differently that might play nicer with LLM agents?

Of course git is great!

Check out Mitchell Hashimoto’s podcast episode on the pragmatic engineer. He starts talking about AI at 1:16:41. At some point after that he discusses git specifically, and how in some cases it becomes impossible to push because the local branch is always out of date.

So if I use Claude to write the first pass at the code, make a few changes myself, ask it to make an additional change, change another thing myself, then commit it — what exactly do you expect to see then?
A Co-Authored-By tag on the commit. It's a standard practice and the meaning is self-explanatory. This is what Claude adds by default too.
I make the commits myself, I don't let Claude commit anything.
I guess if enough people use it, doesn’t the tag become kind of redundant?

Almost like writing “Code was created with the help of IntelliSense”.

I don't think so. The tag doesn't just say "this was written by an LLM". It says which LLM - which model - authored it. As LLMs get more mature, I expect this information will have all sorts of uses.

It'll also become more important to know what code was actually written by humans.

I'm not really sure that's any of their business.
If you accept the code generated by them nearly verbatim, absolutely.

I don't understand why people consider Claude-generated code to be their own. You authored the prompts, not the code. Somehow this was never a problem with pre-LLM codegen tools, like macro expanders, IPC glue, or type bundle generators. I don't recall anybody desperately removing the "auto-generated do not edit" comments those tools would nearly always slap at the top of each file or taking offense when someone called that code auto-generated. Back in the day we even used to publish the "real" human-written source for those, along with build scripts!

It's weird, because they should not consider it as their own, but they should take accountability from it.

Ideally, if I contribute to any codebase, what needs to be judged is the resulting code. Is it up to the project's standards ? Does the maintainer have design objections ?

What tool you use shouldn't matter, be it your IDE or your LLM.

But that also means you should be accountable for it, you shouldn't defend behind "But Claude did this poorly, not me !", I don't care (in a friendly way), just fix the code if you want to contribute.

The big caveat to this is not wanting AI-Generated code for ideological reasons, and well, if you want that you can make your contributors swear they wrote it by themselves in the PR text or whatever.

I'm not really sure how to feel about this, but I stand by my "the code is what matters" line.

Sounds bit like the label "organic (food)" coiuld be applied to hand-written code?
Some differences with the human source for those kinds of tools: (1) the resultant generated code was deterministic (2) it was usually possible to get access to the exact version of the tool that generated it

Since AI tools are constantly obsoleted, generate different output each run, and it is often impossible to run them locally, the input prompts are somewhat useless for everyone but the initial user.

Well is it actually being used as a tool where the author has full knowledge and mental grasp of what is being checked in, or has the person invoked the AI and ceded thought and judgment to the AI? I.e., I think in many cases the AI really is the author, or at least co-author. I want to know that for attribution and understanding what went into the commit. (I agree with you if it's just a tool.)
I have worked with quite a few people committing code they didn't fully understand.

I don't meant this as a drive by bazinga either, the practice of copying code or thinking you understand it when you don't is nothing new

Pre-LLM, it was much easier for reviewers to discern that. Now, the AI-generated code can look like it was well thought out by somebody competent, when it wasn't.
Have you ever reviewed an AI-generated commit from someone with insufficient competence that was more compelling than their work would be if it was done unassisted? In my experience it’s exactly the opposite. AI-generation aggravates existing blindspots. This is because, excluding malicious incompetence, devs will generally try to understand what they’re doing if they’re doing it without AI
I think the issue is not that the patches are more compelling but that they're significantly larger and more frequent
I have. It's always more compelling in a web diff. These guys are the first coworkers for which it became absolutely necessary for me to review their work by pulling down all their code and inspecting every line myself in the context of the full codebase.
I try to understand what the llm is doing when it generates code. I understand that I'm still responsible for the code I commit even if it's llm generated so I may as well own it.
Yes and if they copy and paste code they don’t understand then they should disclose that in the commit message too!
Yes, it sets the reviewer's expectations around how much effort was spent reviewing the code before it was sent.

I regularly have tool-generated commits. I send them out with a reference to the tool, what the process is, how much it's been reviewed and what the expectation is of the reviewer.

Otherwise, they all assume "human authored" and "human sponsored". Reviewers will then send comments (instead of proposing the fix themselves). When you're wrangling several hundred changes, that becomes unworkable.

Sent from my iPhone
> Should I also co-author my PRs with my linter, intellisense and IDE?

Absolutely. That would be hilarious.

Torvalds promotes exactly that. https://github.com/torvalds/linux/blob/master/Documentation/...

Assisted-by: Claude:claude-3-opus coccinelle sparse

Tools do author commits in my code bases, for example during a release pipeline. If I had commits being made by Claude I would expect that to be recorded too. It isn't for recording a bill of tools, just to help understand a projects evolution.
I suspect vibe coders might actually want you to consider turning to Claude for accountability and ownership rather than the human orchestrator.

If your linter is able to action requests, then it probably makes sense to add too.

Eh, there are some very good reasons[0] that you would do better to track your usage of LLM derived code (primarily for legal reasons)

[0]: https://www.jvt.me/posts/2026/02/25/llm-attribute/

legally speaking.. if you're not sure of the risk- you don't document it.
>legally speaking.. if you're not sure of the risk- you don't document it.

Ah, so you kinda maybe sorta absolve yourself of culpability (but not really — "I didn't know this was copyrighted material" didn't grant you copyright), and simultaneously make fixing the potentially compromised codebase (someone else's job, hopefully) 100x harder because the history of which bits might've been copied was never kept.

Solid advice! As ethical as it is practical.

By the same measure, junkyards should avoid keeping receipts on the off chance that the catalytic converters some randos bring in after midnight are stolen property.

Better not document it.

One little trick the legal folks don't want you to know!

Seems ethical
Yea in my Claude workflow, I still make all the commits myself.

This is also useful for keeping your prompts commit-sized, which in my experience gives much better results than just letting it spin or attempting to one-shot large features.

No, because those things don't change the logical underpinnings of the code itself. LLM-written code does act in ways different enough from a human contributor that it's worth flagging for the reviewer.
> The git history is expected to track accountability and ownership, not your Bill of Tools.

The point isn't to hijack accountability. It's free publicity, like how Apple adds "Sent from my IPhone."

Sent from my Ipad
I've heard of employers requiring people to do it for all code written with even a whiff of it
Could be cool if your PRs link back to a blog where you write about your tools.
> Should I also co-author my PRs with my linter, intellisense and IDE?

Kinda, yeah. If I automatically apply lint suggestions, I would title my commit "apply lint suggestions".

Huh? Unless the sole purpose of the commit was to lint code, it would be unnecessary fluff to append the name of the automatically linted tools that ran in a pre-commit hook in every commit.
well maybe?

co-authoring doesn't hide your authorship

if I see someone committing a blatantly wrong code, I would wonder what tool they actually used

You have copyright to a commit authored by you. You (almost certainly) don't have copyright (nobody has) to a commit authored by Claude.
Where is there any legal precedent for that?

In some jurisdictions (e.g. the UK) the law is already clear that you own the copyright. In the US it is almost certain that you will be the author. The reports of cases saying otherwise I have been misreported - the courts found the AI could not own the copyright.

>Where is there any legal precedent for that?

Thaler v. Perlmutter: The D.C. Circuit Court affirmed in March 2025 that the Copyright Act requires works to be authored "in the first instance by a human being," a ruling the Supreme Court left intact by declining to hear the case in 2026.

And in the US constitution,

https://constitution.congress.gov/browse/article-1/section-8...

Authors and inventors, courts have ruled, means people. Only people. A monkey taking a selfie with your camera doesn't mean you own a copyright. An AI generating code with your computer is likewise, devoid of any copyright protection.

The Thaler ruling addresses a different point.

The ruling says that the LLM cannot be the author. It does not say that the human being using the LLM cannot be the author. The ruling was very clear that it did not address whether a human being was the copyright holder because Thaler waived that argument.

the position with a monkey using your camera is similar, and you may or may not hold the copyright depending on what you did - was it pure accident or did you set things up. Opinions on the well known case are mixed: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...

Where wildlife photographers deliberately set up a shot to be triggered automatically (e.g. by a bird flying through the focus) they do hold the copyright.

Guidance on AI is unambiguous.

https://www.copyright.gov/ai/

AI generated code has no copyright. And if it DID somehow have copyright, it wouldn't be yours. It would belong to the code it was "trained" on. The code it algorithmically copied. You're trying to have your cake, and eat it too. You could maybe claim your prompts are copyrighted, but that's not what leaked. The AI generated code leaked.

The linked document labeled "Part 2: Copyrightability", section V. "Conclusions" states the following:

> the Copyright Office concludes that existing legal doctrines are adequate and appropriate to resolve questions of copyrightability. Copyright law has long adapted to new technology and can enable case-by- case determinations as to whether AI-generated outputs reflect sufficient human contribution to warrant copyright protection. As described above, in many circumstances these outputs will be copyrightable in whole or in part—where AI is used as a tool, and where a human has been able to determine the expressive elements they contain. Prompts alone, however, at this stage are unlikely to satisfy those requirements.

So the TL;DR basically implies pure slop within the current guidelines outlined in conclusions is NOT copyrightable. However collaboration with an AI copyrightability is determined on a case by case basis. I will preface this all with the standard IANAL, I could be wrong etc, but with the concluding language using "unlikely" copyrightable for slop it sounds less cut and dry than you imply.

can you tell me where exactly in the documents you link to it says that?
It's beyond obvious that a LLM cannot have copyright, any more than a cat or a rock can. The question is whether anyone has or if whatever content generated by a LLM simply does not constitute a work and is thus outside the entire copyright law. As far as I can see, it depends on the extent of the user's creative effort in controlling the LLM's output.
It may be obvious to you, but it has lead to at least one protracted court case in the US: Thaler v. Perlmutter.

> The question is whether anyone has or if whatever content generated by a LLM simply does not constitute a work and is thus outside the entire copyright law.

Its is going to vary with copyright law. In the UK the question of computer generated works is addressed by copyright law and the answer is "the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken"

Its also not a simple case of LLM generated vs human authored. How much work did the human do? What creative input was there? How detailed were the prompts?

In jurisdictions where there are doubts about the question, I think code is a tricky one. If the argument that prompts are just instructions to generate code, therefore the code is not covered by copyright, then you could also argue that code is instructions to a compiler to generate code and the resulting binary is not covered by copyright.

The binary should be considered "derived work". Only the original copyright owner has the exclusive right to create or authorize derivative works. Means you are not allowed to compile code unless you have the license to do so. Right?
Yes, so is LLM generated code a derivative work of the prompts? Does it matter how detailed the prompts are? How much the code conforms to what is already written (e.g. writing tests)?

It looks like it will be decided on a case by case basis.

It will also differ between countries, so if you are distributing software internationally what will be a constraint on treating the code as not copyrightable.

According to the law, if I use Claude to generate something, I hold the copyright granted Claude didn’t verbatim copy another project.
why wouldn't antroipic own it? they generated it?
It is not "beyond obvious" that a cat cannot have copyright, given the lawsuit about a monkey holding copyright [1], and the way PETA tried to used that case as precedent to establish that any animal can hold copyright.

[1] https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...

Anthropic could at least make a compelling case for the copyright.

It becomes legally challenging with regards to ownership if I ever use work equipment for a personal project. If it later takes off they could very well try to claim ownership in its entirety simply because I ran a test once (yes, there's a while silicon valley season for it).

I don't know if they'd win, but Anthropic absolutely would be able to claim the creation of that code was done on their hardware. Obviously we aren't employees of theirs, though we are customers that very likely never read what we agreed to in a signup flow.

Using work equipment for a personal project only matters because you signed a contract giving all of your IP to your employer for anything you did with (or sometimes without) your employer's equipment.

Anthropic's user agreement does not have a similar agreement.

My point was that they could make a compelling case though, not that they would win.

I don't know of ant precedent where the code was literally generated on someone else's system. Its an open question whether that implies any legal right to the work and I could pretty easily see a court accepting the case.

Who owns the copyright for something not written by anybody, you ask? Is it the man who pays to have it written, or the owner of the machine that does the writing? But it is neither. Nobody owns the copyright because nobody has written it.
I think all you need to do is claim that your girlfriend is your laptop. /s