https://swe-rebench.com
https://livebench.ai/#/
https://eqbench.com/#
https://contextarena.ai/?needles=8
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...
https://artificialanalysis.ai/leaderboards/models
https://gorilla.cs.berkeley.edu/leaderboard.html
https://github.com/lechmazur/confabulations
https://dubesor.de/benchtable
https://help.kagi.com/kagi/ai/llm-benchmark.html
https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
It's nice and simple in the overview mode though. Breaks it down into an intelligence ranking, a coding ranking, and an agentic ranking.
https://artificialanalysis.ai/
I use Firefox Mobile, so perhaps there is a difference on Chromium-based browsers?
https://swe-rebench.com
https://livebench.ai/#/
https://eqbench.com/#
https://contextarena.ai/?needles=8
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...
https://artificialanalysis.ai/leaderboards/models
https://gorilla.cs.berkeley.edu/leaderboard.html
https://github.com/lechmazur/confabulations
https://dubesor.de/benchtable
https://help.kagi.com/kagi/ai/llm-benchmark.html
https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard