Hacker News new | ask | show | jobs
Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments (arxiv.org)
2 points by Anon84 5 days ago