| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by stephantul 5 hours ago
	We’ve been on the receiving end of this complaint with Semble. I think it is a valid complaint, but constructing a benchmark for this kind of thing is just very difficult and expensive because of the (harness) x (model) x (mcp/cli) combination. With traditional ml/tooling, not showing benchmarks was usually a red flag. But for llm tooling, I’m not so sure.