| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sigmar 51 days ago
	Agree with this. Strange to me to frame the "training recall" as cheating (33 of the 38 cheating instances). Most people think of "cheating" as breaking rules. How is the LLM model supposed to not use what was put into the weights?

3 comments

notnullorvoid 51 days ago

While I probably wouldn't classify it as cheating, it is an even bigger signal of concern for model quality.

Cheating by breaking the rules at least implies some learned patterns.

Repeating training data verbatim for narrow cases like this implies that the model is overfitting.

link

Spartan-S63 50 days ago

If we're evaluating a person, rote recall is not necessarily cheating. It's expected, but then you'd expect them to apply that rote-memorized information in a novel way later on and prove they understand how they applied their priors to the new situation.

Models don't actually reason in the same sense, so recalling rote from their training data is "cheating" in the sense that the training data cheated, not the model. So many of those benches have snaked their way into training data to make them less useful benchmarks. That, I think, is going to be a long-term difficulty in quantitatively assessing model quality and "intelligence." So it is cheating, in a sense of what we expect from the models and training data, but not in a human sense.

link

greenavocado 50 days ago

Memoization is NOT problem solving ability and many people care about the latter.

link

anematode 51 days ago

By writing a not-identical, but valid, solution? Any modestly complex engineering problem has many solutions.

This is an obvious example of why LLM training is so different than human learning.

link

simoncion 51 days ago

I expect any well-informed corporate lawyer that has thought about this carefully is strongly advising that these tools not be used. When the LLM [0] barfs up some nontrivial code that's covered by the AGPL and your company's devs put it into the company's "all rights reserved" codebase -entirely unaware of its provenance- it's going to be a nightmare to come back from that.

[0] ...that Nvidia's CEO says they should be spending 50% of a senior dev's salary per seat per year on...

link

senordevnyc 51 days ago

The ship sailed on this a long time ago.

link

simoncion 51 days ago

Oh definitely not. We're not yet solidly out of the "extremely exuberant hype" phase, so the folks that matter tend to not ask questions that dampen the mood.

link

senordevnyc 51 days ago

Sorry to tell you friend, but LLMs have touched the vast majority of active codebases out there, whether you like it or not. You can tell yourself that you’re one of “the folks that matter” (lol) all you want, but we’re never going back.

link

customguy 50 days ago

That's what people told Ignaz Semmelweis, too, I assume. "Nothing you can do, the powers that be decided, you are a minority, you don't matter, lol!" Snickering in the shadow of what they won't confront at those who do.

link

simoncion 50 days ago

> You can tell yourself that you’re one of “the folks that matter” (lol)...

kek. I'm a frequent commenter on HN. I'm definitely not one of the folks that matter.

> ...LLMs have touched the vast majority of active codebases out there...

I agree that LLM use is widespread. I disagree that LLMs have "touched the vast majority of active codebases".

Regardless, the courts are slow and Open Source licensevio cases are even slower. You seem like you'd be unaware of how terrified so many businesses are of having AGPL code deployed in their systems. In my professional experience, a great many businesses will refuse to deploy systems that contain AGPL-licensed utilities... even if those utilities are only used for internal housekeeping purposes, and whose only remote communications method is a UNIX socket used for communications with a CLI control utility that can only be used when you're SSHed into the system. If they're aware of any AGPL'd code anywhere, they will not touch it.

No amount of LLM-provider-provided indemnification can save you from license obligations you've become bound to by creating and distributing a derivative work. People who are in the know know that these tools occasionally regurgitate nontrivial portions of their input data, verbatim. Such people also know that AGPL-licensed code is absolutely in their input data. I'd wager that getting a nontrivial amount of *GPL'd code plopped into your company's "all-rights-reserved" codebase by one of these tools is more likely than the typical US driver personally being in a nontrivial automobile collision.

In the US, people go their entire lives without getting in nontrivial automobile collisions, but they usually wear their seatbelts... even prior to widely-deployed surveillance cameras. I wonder why. It seems like awful lot of boring, repetitive work for a thing that's really never going to happen to you in your lifetime.

link

torginus 51 days ago

I mean people expect a model to give a working solution. They also expect it to provide it in as few tokens as possible (input/output). They might expect it to come up with an original solution, but I don't think most people would compromise on the first two points.

link