|
|
|
|
|
by usaar333
650 days ago
|
|
I'm not sure how well codeforces percentiles correlate to software engineering ability. Looking at all the data, it still isn't. Key notes: 1. AlphaCode 2 was already at 1650 last year. 2. SWE-bench verified under an agent has jumped from 33.2% to 35.8% under this model (which doesn't really matter). The full model is at 41.4% which still isn't a game changer either. 3. It's not handling open ended questions much better than gpt-4o. |
|
Claude on the other hand has been fantastic and seems to do similar reasoning behind the scenes with RL