Hacker News new | ask | show | jobs
by eeasss 219 days ago
Are there any llms in particular that work best with g-evals?
2 comments

LLM Benchmark leaderboard for common evals sounds like a fun idea to me.
I haven’t come across any research showing that a specific LLM consistently outperforms others for this. It generally works best with strong reasoning models that produce consistent outputs.