| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by turnsout 393 days ago
	Yes, this was a great article. We need more of this independent research into LLM quirks & biases. It's all too easy to whip up an eval suite that looks good on the surface, without realizing that something as simple as list order can swing the results wildly.