Hacker News new | ask | show | jobs
Agenteval.org: An Open-Source Benchmarking Initiative for AI Agent Evaluation (scorecard.io)
6 points by Rutledge 483 days ago
1 comments

This initiative is designed to be community-driven, so we're looking forward to your feedback on what agent benchmarking needs exist in your domains. While starting with legal AI, we plan to expand across industries where benchmarks for AI agents evaluation are needed.