| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by majormajor 1111 days ago
	> Not if I was trying to win a race, but I can see how this particular example is a useful way to gauge how an LLM handles a task that looks at first like a simple math problem but requires some deeper insight to answer correctly. It's not just testing reasoning, though, it's also testing fairly niche knowledge. I think a better test of pure reasoning would include all the rules and tips like "it's good to have some buffer" in the prompt.