| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Tostino 77 days ago
	You have a benchmark for output token reduction, but without comparing before/after performance on some standard LLM benchmark to see if the instructions hurt intelligence. Telling the model to only do post-hoc reasoning is an interesting choice, and may not play well with all models.