Hacker News new | ask | show | jobs
by robot-wrangler 36 days ago
> Data here: https://gertlabs.com/rankings?mode=agentic_coding

Oh wow, we got "tribal domination", "market simulator" and "adversarial customer service". I don't know what those are but it sure sounds like big torment nexus milestones

Maybe we could at least play nicer games like hackenbush and act surprised when there's some wicked use-case that's isomorphic.

EDIT: Ok fine. I like "Rubik's Cube Chess" a lot. Never heard of it, is this analyzed formally at all? Hard to search for since there's tons of collisions

1 comments

Not formally analyzed -- in practice, we see a lot of repetition/draws from code submissions. Our version is custom, and uses more pawns, which can move in any direction but don't upgrade to other pieces. We try to include just as many cooperatives games as competitive games, but both are important for measuring model ability in the real world.