| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jjoonathan 350 days ago
	Yeah, the heavily distilled models are very bad with hallucinations. I think they use them to cover for decreased capacity. A 1B model will happily attempt the same complex coding tasks as a 1T model but the hard parts will be pushed into an API call that doesn't exist, lol.