|
|
|
|
|
by rTX5CMRXIfFG
24 days ago
|
|
Am I crazy, or are these differences between the best models so marginal that you’d get roughly the same performance if you use the same high-quality harness (ie preloaded instructions from md files, including custom skills)? |
|
It's like most people just watching a 'starting nba player' (not superstar, but just starting player) vs one that sits on the bench.
If you were to just watching them play, work out, shoot - you'd never notice the difference.
Put them head to head and it's 98-54 and you start to see the patterns.
It's pretty interesting actually, someone tell me what the 'science' for this is, I'm sure there is some kind of information theory at work here.
Software has innumerable kinds of problems at varying level of complexity and so it provides the perfect testbed for seeing how far models can go in practice.
Should add: you're very right to hint that harness, tooling, and models tuned o both the harness and he kinds of things people do on the harness, as well as some other things do make enormous difference.
Bu and large, SOTA Codex/Claude Code are substantially better - at least for now. That may change.