Hacker News new | ask | show | jobs
by YetAnotherNick 587 days ago
No, because solving a well defined problem with well defined right or wrong is generally not what people use llm for. Most of the times my query to llm is underspecified, and lot of time I figure out the problem when chatting with LLM. And benchmark by definition only measures just right/wrong answer.