| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by eeasss 266 days ago
	Are there any llms in particular that work best with g-evals?

2 comments

lyuata 266 days ago

LLM Benchmark leaderboard for common evals sounds like a fun idea to me.

link

zlatkov 266 days ago

I haven’t come across any research showing that a specific LLM consistently outperforms others for this. It generally works best with strong reasoning models that produce consistent outputs.

link