| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zamalek 84 days ago
	It's because of how transformers work, especially the fact that the output layer is a bunch of weights which we quite literally do a weighted random choice from. My hunch is that diffusion models would have a higher chance of doing real reasoning - or something like a latent space for reasoning. Thinking that LLMs are intelligent arises from an incomplete understanding of how they work or, alternatively, having shareholders to keep happy.