Hacker News new | ask | show | jobs
by yohji1984 5 days ago
Sorry, I missed the Open Items section. You're right about that, designing a good eval harness can be difficult and expensive. Maybe we need some kind of community project for agentic evals, where people can share eval harnesses and run logs.