|
|
|
|
|
by spaceman_2020
650 days ago
|
|
1673 ELO is wild If its actually true in practice, I sincerely cannot imagine a scenario where it would be cheaper to hire actual junior or mid-tier developers (keyword: "developers", not architects or engineers). 1,673 ELO should be able to build very complex, scalable apps with some guidance |
|
1. AlphaCode 2 was already at 1650 last year.
2. SWE-bench verified under an agent has jumped from 33.2% to 35.8% under this model (which doesn't really matter). The full model is at 41.4% which still isn't a game changer either.
3. It's not handling open ended questions much better than gpt-4o.