| Meta AI is releasing two new resources for AI agents research:
- GAIA 2 Benchmark: An updated approach to agents evaluation • 800 dynamic scenarios across ten realistic universes • Tests adaptability, robustness to failure, and time sensitivity • Moves beyond static benchmarks to evaluate real-world agent capabilities - Agents Research Environments (ARE): A simulation platform for agents research • Dynamic, evolving environments that mirror real-world complexity • Built-in reward signals and comprehensive evaluation tools • Realistic apps (email, calendar, file system, messaging) with realistic data • Event-driven architecture that creates dynamic scenarios for multi-turn tasks |