|
|
|
|
|
by simonw
432 days ago
|
|
I don't think I understood your point then. I matched it with the common "LLMs can only produce code that's similar to what they've seen before" argument. Reading back, you said: > I often see people wondering if the some coding task is performed well or not because of availability of code examples in the training data. It's way worse than that. It's overfitting to diffs it was trained on. I'll be honest: I don't understand what you mean by "overfitting to diffs it was trained on" there. Maybe I don't understand what "overfitting" means in this context? (I'm afraid I didn't understand your cannon / fly swatter analogy either.) |
|
That is the premise of LLM-as-AI. By training these models on enough data, knowledge of the world is purported as having been captured, creating something useful that can be leveraged to process new input and get a prediction of the trajectory of the system in some phase space.
But this, I argue, is not the case. The models merely overfit to the training data. Hence the variable results perceived by people. When their intentions and prompt fit to the data in the training, the model appears to give good output. But the situation and prompt do not, the models do no "reason" about it and "infer" anything. It fails. It gives you gibberish or go in circles, or worse if there is some "agentic" arrangement if fails to terminate and burns tokens until you intervene.
It's overkill. And I am pointing out it is overkill. It's not a clever system for creating code for any given situation. It overfits to training data set. And your response is to claim that my argument is something else, not that it's overkill but that it can only kill dead things. I never said that. I see it's more than capable of spitting out useful code even if that exact same code is not in the training dataset. But it is just automating the process of going through google, docs and stack overflow and assembling something for you. You might be good at searching and lucky and it is just what you need. You might not be so used to using the right keywords or just be using some uncommon language, or in a domain that happens to not be well represented and then it feels less useful. But instead of just coming up short as search, the model overkills and wastes your time and god knows how much subsidized energy and compute. Lucky you if you're not burning tokens on some agentic monstosity.