|
|
|
|
|
by fancy_pantser
65 days ago
|
|
It's looking rather low on reasoning and long-range problems with the approach described. For example, even with 16 agents and compaction, the HLE score is significantly below Anthropic's Mythos. Like you, I can see the release as a net Good Thing, but apples-to-apples for each org's latest models do have Meta holding steady in the middle pack. |
|