Hacker News new | ask | show | jobs
Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents (arxiv.org)
2 points by pontiacbandit8 396 days ago