Y
Hacker News
new
|
ask
|
show
|
jobs
by
sync
297 days ago
I'm doing coreference resolution and this model (w/o thinking) performs at the Gemini 2.5-Pro level (w/ thinking_budget set to -1) at a fraction of the cost.
2 comments
antman
297 days ago
Nice point. How did you test for coreference resolution? Specific prompt or dataset?
link
dr_dshiv
297 days ago
Strong claim there!
link