Hacker News new | ask | show | jobs
by stephc_int13 138 days ago
This is a nice benchmark IMO. I would be curious to see how competitors and improved models would compare.
1 comments

And how long will it take before an open model recreates this. The "vibe" consensus before "thinking" models really took off was that open was ~6mo behind SotA. With the massive RL improvements, over the past 6 months I've thought the gap was actually increasing. This will be a nice little verifiable test going forward.