| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mortimerp9 276 days ago

Meta AI is releasing two new resources for AI agents research: - GAIA 2 Benchmark: An updated approach to agents evaluation

• 800 dynamic scenarios across ten realistic universes

• Tests adaptability, robustness to failure, and time sensitivity

• Moves beyond static benchmarks to evaluate real-world agent capabilities

- Agents Research Environments (ARE): A simulation platform for agents research

• Dynamic, evolving environments that mirror real-world complexity

• Built-in reward signals and comprehensive evaluation tools

• Realistic apps (email, calendar, file system, messaging) with realistic data

• Event-driven architecture that creates dynamic scenarios for multi-turn tasks