Hacker News new | ask | show | jobs
by mxkopy 1101 days ago
> Yet, they demonstrate a crucial point: a deeper understanding of reality simplifies next-token prediction tasks.

I'm not sure LLMs are trained to simplify anything. They have billions of parameters after all.

1 comments

They "simplify" the training data, which they are vastly smaller than. LLMs are like compression algorithms. You could imagine feeding the training data back in, letting it guess the next token, and entropy coding the residual - this would result in an excellent compression ratio. This compression performance is a direct consequence of abstract features of the dataset that it has managed to encode - knowing that the capital of France is Paris allows you to make predictions about many sentences, not just "The capital of France is...".
True, but I still think there's some fallacy here. Are we sure that models of the world (i.e. understanding) are the only way to achieve compression?