Hacker News new | ask | show | jobs
by simonw 432 days ago
I don't think I understood your point then. I matched it with the common "LLMs can only produce code that's similar to what they've seen before" argument.

Reading back, you said:

> I often see people wondering if the some coding task is performed well or not because of availability of code examples in the training data. It's way worse than that. It's overfitting to diffs it was trained on.

I'll be honest: I don't understand what you mean by "overfitting to diffs it was trained on" there.

Maybe I don't understand what "overfitting" means in this context?

(I'm afraid I didn't understand your cannon / fly swatter analogy either.)

1 comments

It's overkill. The models do not capture knowledge about coding. They overfit to the dataset. When one distills data into a useful model the model can be used to predict future behavior of the system.

That is the premise of LLM-as-AI. By training these models on enough data, knowledge of the world is purported as having been captured, creating something useful that can be leveraged to process new input and get a prediction of the trajectory of the system in some phase space.

But this, I argue, is not the case. The models merely overfit to the training data. Hence the variable results perceived by people. When their intentions and prompt fit to the data in the training, the model appears to give good output. But the situation and prompt do not, the models do no "reason" about it and "infer" anything. It fails. It gives you gibberish or go in circles, or worse if there is some "agentic" arrangement if fails to terminate and burns tokens until you intervene.

It's overkill. And I am pointing out it is overkill. It's not a clever system for creating code for any given situation. It overfits to training data set. And your response is to claim that my argument is something else, not that it's overkill but that it can only kill dead things. I never said that. I see it's more than capable of spitting out useful code even if that exact same code is not in the training dataset. But it is just automating the process of going through google, docs and stack overflow and assembling something for you. You might be good at searching and lucky and it is just what you need. You might not be so used to using the right keywords or just be using some uncommon language, or in a domain that happens to not be well represented and then it feels less useful. But instead of just coming up short as search, the model overkills and wastes your time and god knows how much subsidized energy and compute. Lucky you if you're not burning tokens on some agentic monstosity.

You are correct that variable results could be a symptom of a failure to generalise well beyond the training set.

Such failure could happen if the models were overfit, or for other reasons. I don't think 'overfit', which is pretty well defined, is exactly the word you mean to use here.

However, I respectfully disagree with your claim. I think they are generalising well beyond the training dataset (though not as far beyond as say a good programmer would - at least not yet). I further think they are learning semantically.

Can't prove it in a comment except to say that there's simply no way they'd be able to successfully manipulate such large pieces of code, using English language instructions, it they weren't great at generalisation and ok at understanding semantics.

I understand your position. But I think you're underestimating just how much training data is used and how much information can be encoded in hundreds of billions of parameters.

But this is the crux of the disagreement. I think the models overfit to the training data hence the fluctuating behavior. And you think they show generalization and semantic understanding. Which yeah they apparently do. But the failure modes in my opinion show that they don't and would be explained by overfitting.

If that's the case, it turns out that what I want is a system that's "overfitted to the dataset" on code, since I'm getting incredibly useful results for code out of it.

(I'm not personally interested in the whole AGI thing.)

Good man I never said anything about AGI. Why do you keep responding to things I never said?

This whole exchange was you having knee-jerk reactions to things you imagined I said. It has been incredibly frustrating. And at the end you shrug and say "eh it's useful to me"??

I am talking about this because of deceitfulness, resource efficiency, societal implications of technology.

"That is the premise of LLM-as-AI" - I assumed that was an AGI reference. My definition of AGI is pretty much "hyped AI". What did you mean by "LLM-as-AI"?

In my own writing I don't even use the term "AI" very often because its meaning is so vague.

You're right to call me out on this: I did, in this earlier comment - https://news.ycombinator.com/item?id=43644662#43647037 - commit the sin of responding to something you hadn't actually said.

(Worse than that, I said "... is uninformed in my opinion" which was rude because I was saying that about a strawman argument.)

I did that thing where I saw an excuse to bang on one of my pet peeves (people saying "LLMs can't create new code if it's not already in their training data") and jumped at the opportunity.

I've tried to continue the rest of the conversation in good faith though. I'm sorry if it didn't come across that way.

> My definition of AGI is pretty much

Simon, intelligence exists (and unintelligence exists). When you write «I'm not claiming LLMs can invent new computer science», you imply intelligence exists.

We can implement it. And it is somehow urgent, because intelligence is very desirable wealth - there is definite scarcity. It is even more urgent after the recent hype has made some people perversely confused about the idea of intelligence.

We can and must go well beyond the current state.