Hacker News new | ask | show | jobs
Establishing Best Practices for Building Rigorous Agentic Benchmarks (arxiv.org)
4 points by frontfor 342 days ago