Hacker News new | ask | show | jobs
by freediver 666 days ago
Short answer is no, because there is no 'standardized' use case.

One thing is sure - that current commonly used benchmarks are mostly polluted and worthless. So you have to go to niche ones.

For example the one I check for coding is Aider LLM leaderboard [1].

We maintain Kagi LLM Benchmarking Project [2] optimized for the use case of using LLMs in search.

[1] https://aider.chat/docs/leaderboards/

[2] https://help.kagi.com/kagi/ai/llm-benchmark.html