Hacker News new | ask | show | jobs
by ai_slop_hater 5 days ago
No they are clearly not just scaled up versions of gpt 2; there are different LLM architectures like mixture of experts etc that appeared relatively recently. I am not an expert though, far from it.
1 comments

MoE and such are basically performance enhancements, they don't make the model smarter.
separately trained experts can surpass performance in their activated regime and DOES result in a smarter model, the Claude system cards talk about this and eg there is https://openreview.net/forum?id=iydmH9boLb to read...
Performance enhancements are huge though.

If you can make the existing model faster, you can then save your inference budget to then make your model bigger, which then makes it smarter.

A lot of how smart the models can be comes down to budget. If you can make your existing thing cheaper, you can instead make it bigger for the same price.

Not really “smarter” though? It’s just a big probability engine.

(Not trying to flame bait or anything. I just wouldn’t call LLM as exhibiting intelligence. It is great at making connections based on probability but doesn’t have a semantic understanding of what it is doing)

You do realize modern neuroscience considers the human brain as "just" a probability engine and that intelligence may well be the ability for an organism to predict well.

> doesn’t have a semantic understanding of what it is doing

I hope you realize this is an area of open, active research.

Didn't neuroscience some big scandals about bad statistics and overstating their findings (in addition to normal issues like replication)? Look up at least the "dead salmon study" (hint: it's related to fMRI, and you can probably guess its conclusions from its nickname). The "Voodoo Correlations" and "Cluster Failure" papers are also a bit eye-opening.

In general we (humans) need to be humble about the limitations of our knowledge about how we function, it's an insanely complicated problem.

> In general we (humans) need to be humble about the limitations of our knowledge about how we function, it's an insanely complicated problem.

We do.

Which is why we shouldn't be assuming we're more than just probability engines, or be assuming we have more consciousness than a neural network.

> to then make your model bigger, which then makes it smarter

There's diminishing returns and at some point making a model bigger makes it dumber.

Maybe due to lack of data and dimensions other than words.
Performance enhancements are what allow you to train a bigger model.