| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by simsla 255 days ago
	There's no inductive bias for a world model in multiheaded attention. LLMs are incentivized to learn the most straightforward interpretation/representation of the data you present. If the data you present is low entropy, it'll memorize. You need to make the task sufficiently complex so that memorisation stops being the easiest solution.