| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eschaton 55 days ago

“Don’t get mad at people for doing something unethical or immoral, or they’ll do something unethical or immoral!”

Disabling attribution of LLM-generated code is fraud, because you’re saying you wrote the code.

Of course that fits right in with the use of an LLM to generate code in the first place, since what it’s actually doing is regurgitating its inputs stripped of any license and copyright notice.

4 comments

UebVar 55 days ago

I'm very certain that this is not fraud, across multiple legal systems, both roman and common law. In both cases fraud requires a person is deprived of a material good. Neither the defrauded person or their material loss is present in this case. Maybe there is a oddball legal system somewhere in the world where fraud is something entirely different, but i doubt it. "Fraud", just like "Decorator Pattern" is a well established concept and pretty simple concept, even if there are edge cases. This does not fit at all.

In academia this is miss-attribution, outside of academia this does not exist.

This is clearly not not copyright infringement either as LLMs do not claim copyright, nor could they. Just like the photograph taken by the monkey, or pictures drawn by crows. LLM output is not a creative work either.

If this is unethical or immoral is a totaly different question. I really dont think so and I dont think you argue that position well.

eschaton 55 days ago

It is misrepresentation for gain, that gain does not need to be monetary to be material. For example, it can be reputational.

It also is copyright infringement, because what the LLM “generates” are actually portions of its training set, which were covered by copyright. Just passing through an LLM does not remove that copyright from that work.

UebVar 55 days ago

No, you are wrong.

In German and French (roman) legal systems this is a "Vermögensdelikt", and explicitly about material damage and gain. Yes, common law can be more broad (in canada it isn't really, it just also includes service, btw.), and yet it clearly does not meet the definition, as there is a damaged/defraued party and fraudulent/gaining party. We are not talking about somebody usurping somebody else reputation, after all.

You misuse a technical term that is well established since antiquity.

You do not know what this word means. If you want to argue about semantics, look up the definition. This works especially well for legal terms as laws define them.

(That said, IANAL and there are very many different legal systems and I am not ruling out there exists one that is competently different - laws can be changed a will, after all.)

It is also obviously not copyright infringement, because this is simply not how copyright works, at all. I cannot and will explain of all copyright here. Instead I will point this out: Every code produced by a human who read copyrighted code would fall under your definition.

eschaton 53 days ago

No, you are wrong. You are either willfully misunderstanding what I’m calling fraud, or you are misinformed as to what “material gain” means in many legal systems.

With respect to the former, “fraud” is a shorthand for “fraudulent misrepresentation,” which is what you’re doing when you take someone else’s IP and try to contribute it to a project without securing the right to do so. It can be read as implicit in the attempt to contribute to the project that you have secured this permission (or do not need to, because the work is original to you). Whether the code came out of an LLM or was copied from another project or Stack Overflow doesn’t matter, it’s that you’re misrepresenting the rights you have that’s the fraudulent part.

For the latter, I specifically pointed out that the gain from fraudulent misrepresentation need not be monetary. The gain can be reputational or any other sort of benefit. For example, someone pretending to a fictional person to gain access to a space they otherwise wouldn’t is still committing fraud.

Finally, you’re wrong about whether the output of an LLM infringes copyright of material in its training set. Just running a copyrighted work through an LLM does not remove the copyright on that work if reproduced by the LLM.

UebVar 46 days ago

You are misinformed, I suspect you have no idea what you are talking about.

As I said, I do not know all legal systems in the world. If there one where "material gain" matches your idea, please cite the law or a case that includes LLM usage. As I explained in the canadian law even includes services and yet it is so much very much not matching the defintion for reasons explained.

I do understand very well what you mean by "fraud", I do not miss represent it - your opinion on what it should be is plain and simple wrong. I explained why in my previous posts.

You are under the impression that legal science is some kind of folk etymology. It absolutely is not. Fraud is §263 StGB, Art. 313-1 Code penal or §380 of the canadian criminal code. (They all are remarkably similar, because they share a millennia old tradition. Making them IMHO fascinating cultural artifacts.) Here [0] is a structured version of on of these texts. Think of it as a symbolic execution of the law. You can see there is structural mismatch with your "case". Nobody ubsurbs anything from somebody else, and all three laws incude that in their defintion. That was my original claim.

You think you somehow can make up your own private definitions, develop your own private theories about them, apply them and argue about the semantics your made up terms. That is the opposite of how jurisprudence works. It rigorous, with well established scientific and scholastic methods. It operates on term defined by the law. In the case of "fraud" the previous citations, especially in criminal law, and nothing else. German legal science has its own theory what counts as "nothing else" under the name "Wortlautgrenze". These terms and methods vary from jurisdiction to jurisdiction, but by surprisingly little.

Dont call your code a decorator pattern, because you think it is decorative. Different pattern libraries have definitions for that and you need to be able to argue it fits. Like wise, if you feel something involves some kind of misrepresentation its probably not fraud. If things have different names, that probably for a good reasons, especially in legal science.

[0] https://www.iurastudent.de/schemata/schema-zum-betrug-263-i-...

jhack 55 days ago

"Disabling attribution of LLM-generated code is fraud, because you’re saying you wrote the code."

Should there by attribution for Google or Stack Overflow copy/paste? Who should we bully about this?

School-Cotton 55 days ago

> Should there by attribution for Google or Stack Overflow copy/paste?

Obviously, and I'm a bit taken aback that anyone thinks otherwise.

eschaton 55 days ago

Yes, in fact, this is why people who do that are looked down upon.

They are in fact committing fraud if they do not attribute the code in their commit properly, because by committing it they’re claiming to have rights by virtue of authorship that they do not have. (Namely, the right to contribute that code to the project,.) They may also be committing copyright infringement, depending on the copyright and license status of some code they found via Google or Stack Overflow.

It’s always fascinating to me to see how many people on Hacker News have such extremely poor understanding of how intellectual property actually works, and how misrepresenting themselves or their work can actually have consequences.

dml2135 55 days ago

Are there any court cases you can point to that have clearly established that using LLM generated code can be a copyright violation? My understanding is that this is very far from being settled law.

eschaton 55 days ago

What cases can you cite that have determined it’s not?

It’s clear on its face that LLMs can and do store and reproduce copyrighted works; using a form of (somewhat) lossy data compression. And using a lossy stochastic or perceptual form of compression to reproduce a copyrighted work doesn’t somehow make it not storage or reproduction, otherwise sharing MP3 files wouldn’t be copyright infringement.

Anyone engaging in responsible risk management should assume that anything LLM-generated is infringing until determined otherwise by the courts, not the other way around.

dml2135 54 days ago

There are billion-dollar entities preparing to fight this very question out in court as we speak.

Your interpretation of the law is certainly plausible, but it is clearly not a settled question.

If you really are so confident, go bet on Kalshi and make some easy money: https://kalshi.com/markets/kxnytoai/new-york-times-wins-open...

Leynos 55 days ago

Outside of situations where it is required by contract, attributing AI usage is a courtesy, nothing more.

eschaton 53 days ago

So it’s OK to just paste other people’s IP into a change you’re submitting to a project without caring about the license or originator?

Leynos 43 days ago

I said "outside of situations where it is required by contract", which I believe would include a CLA.

infamouscow 55 days ago

It's only fraud if a person signed their name stating such.

Their name being attached to the commit is itself, irrelevant, as their is no way to submit a patch otherwise. You could use a fake name, but you're just moving this fraud problem around.

You're going to have a hard time convincing anyone that using a tool constitutes fraud. Frankly, it's silly, if not genuinely stupid.

Film photographers in the early 2000s routinely called digital "not real photography" and Photoshop "cheating" because you could delete bad shots and fix everything later. Traditional musicians and critics dismissed drum machines, synthesizers, and autotune as soulless tools.

eschaton 55 days ago

Intent and custom both matter quite a bit in law. It is customary to treat the name attached to a commit as the copyright holder of any changes represented by that commit, just as it was for the sender of an email containing a patch back when that was how such work was done.

Often this is also spelled out in a project’s contribution guidelines, and some projects have even had more explicit copyright assignment policies they required contributors to agree to, but the lack of such guidelines or assignment policies does not mean the custom as normally observed in the field is irrelevant.

kelnos 55 days ago

> Intent and custom both matter quite a bit in law.

Indeed, and I'm not aware of any (Western, at least) legal system that would consider it fraud to not disclose that an LLM had generated some code.

I'd like to gently point out that your insistence of fraud here is hurting your overall argument, and is causing people to focus on the language you're using, instead of the substance of what you're trying to say. I do agree with you that people should disclose LLM generation when writing commits. But the way you're going about arguing this "fraud" thing is an unproductive dead end.

eschaton 54 days ago

The fraud isn’t (directly) in hiding that the LLM generated some code. The fraud is in the (implicit) misrepresentation of ownership of and/or rights to the code.

When you send a patch or pull request to a project, you’re saying (implicitly) that you have the necessary rights to contribute the intellectual property it contains. If you used an LLM to “generate” some of it, that is not necessarily the case.

A similar situation would occur if you agreed to pay someone else to create a patch, and then submitted it under your own name without paying them. Because it’s a work for hire, it’s not yours until they’re paid for it, so you’re fraudulently misrepresenting your rights to that patch to the project. If you did pay the creator, you don’t have to attribute them unless it’s in the contract between you and the creator, or unless the project requires such attribution.