Hacker News new | ask | show | jobs
by gertlabs 57 days ago
Our philosophy is that you can design problems so that they can scale through a few release cycles by making environments more complex, with no known ceiling. The key for scalability is not having a single correct answer (even though Victor's benchmark is interesting), but still being objectively scorable.

That's what we've done with our comprehensive reasoning and coding benchmark at https://gertlabs.com