Hacker News new | ask | show | jobs
by jlas 481 days ago
> (more resources = closer to intelligence)

The scaling law only states that more resources yield lower training loss (https://en.wikipedia.org/wiki/Neural_scaling_law). So for an LLM I guess training loss means its ability to predict the next token.

So maybe the real question is: is next token prediction all you need for intelligence?

1 comments

As a human, I oftentimes can solidify ideas by writing them out and editing my writing in a way that wouldn’t really work if I could only speak them aloud a word at a time, in order.

And before we go to “the token predictor could compensate for that…” maybe we should consider that the reason this is the case is because intelligence isn’t actually something that can be modeled with strings/tokens.

Yann LeCun discussed why LLMs are not enough for AGI on Lex Fridman pod: https://youtu.be/5t1vTLU7s40?t=138
I really liked the simplicity of his explanation in information theory terms. Thank you!