Hacker News new | ask | show | jobs
OpenGameEval: Eval Framework to Benchmark Agentic AI Assistants (corp.roblox.com)
7 points by moneil971 188 days ago
1 comments

OpenGameEval offers a unique testing ground to evaluate core model capabilities related to agenetic reasoning and long-horizon task solving.