| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jlas 481 days ago

> (more resources = closer to intelligence)

The scaling law only states that more resources yield lower training loss (https://en.wikipedia.org/wiki/Neural_scaling_law). So for an LLM I guess training loss means its ability to predict the next token.

So maybe the real question is: is next token prediction all you need for intelligence?

1 comments

pseudocomposer 481 days ago

As a human, I oftentimes can solidify ideas by writing them out and editing my writing in a way that wouldn’t really work if I could only speak them aloud a word at a time, in order.

And before we go to “the token predictor could compensate for that…” maybe we should consider that the reason this is the case is because intelligence isn’t actually something that can be modeled with strings/tokens.

link

jlas 481 days ago

Yann LeCun discussed why LLMs are not enough for AGI on Lex Fridman pod: https://youtu.be/5t1vTLU7s40?t=138

link

pseudocomposer 481 days ago

I really liked the simplicity of his explanation in information theory terms. Thank you!

link