Hacker News new | ask | show | jobs
by Rutledge 477 days ago
This initiative is designed to be community-driven, so we're looking forward to your feedback on what agent benchmarking needs exist in your domains. While starting with legal AI, we plan to expand across industries where benchmarks for AI agents evaluation are needed.