I keep seeing this cop-out, which ignores that it's fundamentally the same architecture, and has the same flaws. More wallpaper to hide the cracks better makes it an even worse tool for these use cases because all it does is fool more people into thinking it has capabilities that it fundamentally doesn't.
I don't think this is a fair argument. If we compare a GPT4 architecture with 5,000 parameters and a GPT4 architecture with 1 trillion parameters, should we judge the capabilities of both by the 5,000 parameter version, because they're both the same architecture?
There is more than architecture that can set them apart as well. GPT4 may have been trained by a slightly different algorithm, or on different data, and this can result in fundamentally different results.
Most of these conversations are not focused on one specific version, but are about the capabilities of LLMs in general, and it is implied we are talking about state-of-the-art LLMs, and GPT3 is no longer state-of-the-art.
This is nonsense. It's not a cop-out to say "use the latest, most capable model before complaining". Anyone remotely close to this field knows model size matters, amount of training data matters, quality of training data matters, and several other variables matter. Even if someone knows zero about it, just using 3.5 v 4 is enough to see they are two different things. Like a lizard v a human.
It's still fundamentally the same, hallucinates just the same, and anthropomorphizes itself as a confident, knowledgeable, intelligent being just the same. A newer, better, faster, more capable car still isn't an airplane, even if it go fast enough to spend several seconds in the air.
Sure, and 40 year olds have the same capabilities as 4 year olds, because "same architecture" or "fundamentally the same". And putting random weights inside the GPT-4 model architecture should behave "fundamentally the same" as the trained GPT-4 weights, because it's "same architecture". Forget this "training" stuff.
It's not a person, it's a machine. And it's one that will still produce hallucinations that embarrassingly prove that it has no notion of intelligence, and do so confidently. That it does so less than it's sibling is entirely irrelevant.
To me it's a bit like someone making the claim "humans are flawed, and we should think critically about the things they say", and someone responding with "well which human are you talking about? Because Einstein is orders of magnitude above the Walmart checkout guy".
I keep seeing this cop-out, which ignores that it's fundamentally the same architecture, and has the same flaws. More wallpaper to hide the cracks better makes it an even worse tool for these use cases because all it does is fool more people into thinking it has capabilities that it fundamentally doesn't.