Hacker News new | ask | show | jobs
by kromem 951 days ago
This line of nonsense has become the new "Tell me you aren't current on the past 12 months of LLM research without telling me."

Harvard/MIT's Othello-GPT paper showing the development of what turned out to be linear representations of world models from training data that didn't explicitly contain that modeling is over a year old now.

That in turn inspired research showing linear representations in geographical mapping and in more traditional text models around truthiness vs falsehoods.

So we already have an increasing research trend that is showing over and over linear representations of more abstract modeling than "just statistics."

So you are wrong that LLMs with sufficient network complexity don't develop an understanding of the world (in parts).

And I'd encourage looking more into the difference between understanding the difference between training for next token prediction and the overall capabilities of the network with the smallest loss at that training task, particularly as network complexity increases.