|
|
|
|
|
by gertlabs
57 days ago
|
|
Our philosophy is that you can design problems so that they can scale through a few release cycles by making environments more complex, with no known ceiling. The key for scalability is not having a single correct answer (even though Victor's benchmark is interesting), but still being objectively scorable. That's what we've done with our comprehensive reasoning and coding benchmark at https://gertlabs.com |
|