| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by potatolicious 320 days ago
	This feels like honestly the biggest gain/difference. I work on things that do a lot of tool calling, and the model hallucinating fake tools is a huge problem. Worse, sometimes the model will hallucinate a response directly without ever generating the tool call. The new training rewards that suppress hallucinations and tool-skipping hopefully push us in the right direction.