Hacker News new | ask | show | jobs
by chaidhat 52 days ago
Maybe we are: https://andonlabs.com/evals/vending-bench-2