| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andrewmcwatters 364 days ago
	I think anyone frequenting HN and actually using these tools absolutely knows these benchmarks are 100% bullshit and the only real way to test these things is to just use them yourself. Many small models are supposedly good for controlled tasks, but given a detailed prompt, I can't get any of them to follow simple instructions. They usually just regurgitate the examples in the system prompt. Useless.