| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sync 297 days ago
	I'm doing coreference resolution and this model (w/o thinking) performs at the Gemini 2.5-Pro level (w/ thinking_budget set to -1) at a fraction of the cost.

2 comments

Nice point. How did you test for coreference resolution? Specific prompt or dataset?

Strong claim there!