Hacker News new | ask | show | jobs
by arnaudsm 377 days ago
- https://livebench.ai/#/ + AIME + LiveCodeBench for reasoning

- MMLU-Pro for knowledge

- https://lmarena.ai/leaderboard for user preference

We only got Magistral's GPQA, AIME & livecodebench so far.