Hacker News new | ask | show | jobs
by wongarsu 440 days ago
That argument only really applies to base models. After that we train them to give correct and helpful answers, not just answers that are statistically probable in the training data.

But even if we ignore that subtlety, it's not obvious that training a model to predict the next token doesn't lead to a world model and an ability to apply it. If you gave a human 10 physics books and told them that in a month they have a test where they have to complete sentences from the book, which strategy do you think is more successful: trying to memorize the books word by word or trying to understand the content?

The argument that understanding is just an advanced form of compression far predates LLMs. LLMs clearly lack many of the facilities humans have. Their only concept of a physical world comes from text descriptions and stories. They have a very weird form of memory, no real agency (they only act when triggered) and our attempts at replicating an internal monologue are very crude. But understanding is one thing they may well have, and if the current generation of models doesn't have it the next generation might