Hacker News new | ask | show | jobs
by simsla 255 days ago
There's no inductive bias for a world model in multiheaded attention. LLMs are incentivized to learn the most straightforward interpretation/representation of the data you present.

If the data you present is low entropy, it'll memorize. You need to make the task sufficiently complex so that memorisation stops being the easiest solution.