|
|
|
|
|
by phatfish
120 days ago
|
|
Does anyone know what this "APEX-Agents benchmark for long time horizon investment banking, consulting and legal work" actually evaluates? That sounds so broad that creating a meaningful benchmark is probably as difficult as creating an AI that actually "solves" those domains. |
|