Hacker News new | ask | show | jobs
Beyond Benchmark Maxxing: Measuring Open Source Models as Real-World Agents (ultravox.ai)
1 points by zkoch 298 days ago