|
|
|
|
|
by jlas
481 days ago
|
|
> (more resources = closer to intelligence) The scaling law only states that more resources yield lower training loss (https://en.wikipedia.org/wiki/Neural_scaling_law). So for an LLM I guess training loss means its ability to predict the next token. So maybe the real question is: is next token prediction all you need for intelligence? |
|
And before we go to “the token predictor could compensate for that…” maybe we should consider that the reason this is the case is because intelligence isn’t actually something that can be modeled with strings/tokens.