Hacker News new | ask | show | jobs
by meatmanek 309 days ago
Reasoning models do a lot better at AIME than non-reasoning models, with o3 mini getting 85% and 4o-mini getting 11%. It makes some sense that this would apply to small models as well.