Hacker News new | ask | show | jobs
Gaia2 and Are: Empowering the Community to Evaluate Agents (huggingface.co)
5 points by mortimerp9 276 days ago
1 comments

Meta AI is releasing two new resources for AI agents research: - GAIA 2 Benchmark: An updated approach to agents evaluation

• 800 dynamic scenarios across ten realistic universes

• Tests adaptability, robustness to failure, and time sensitivity

• Moves beyond static benchmarks to evaluate real-world agent capabilities

- Agents Research Environments (ARE): A simulation platform for agents research

• Dynamic, evolving environments that mirror real-world complexity

• Built-in reward signals and comprehensive evaluation tools

• Realistic apps (email, calendar, file system, messaging) with realistic data

• Event-driven architecture that creates dynamic scenarios for multi-turn tasks