Hacker News new | ask | show | jobs
by hbrundage 667 days ago
Isn't 63% => 54% regression on MMLU-Pro a huge issue? They said that it excels at advanced reasoning but that seems like a big drawback there.
1 comments

Yeah it doesn't win in every category. I will say watching it in the discord I saw its performance vary widely so the context and sys prompt plays a huge role. Initially it did great and solved some pretty heavy logic questions but after the context was loaded with trolling it degraded quite a bit and couldn't solve problems it previously was able to.