| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mustacheemperor 1112 days ago

>I don't believe that for a second.

This seems needlessly flippant and dismissive, especially when you could just crack open ChatGPT to verify, assuming you have plus or api access. I just did, and ChatGPT gave me a well-reasoned explanation that factored in the extra details about racing the other commenters noted.

>There are many examples where GPT4 fails spectacularly at much simpler reasoning tasks.

I pose it would be more productive conversation if you would share some of those examples, so we can all compare them to the rather impressive example the top comment shared.

>I wouldn't trust GPT4 to tell me how much fuel I should put in my car. Would you?

Not if I was trying to win a race, but I can see how this particular example is a useful way to gauge how an LLM handles a task that looks at first like a simple math problem but requires some deeper insight to answer correctly.

1 comments

majormajor 1111 days ago

> Not if I was trying to win a race, but I can see how this particular example is a useful way to gauge how an LLM handles a task that looks at first like a simple math problem but requires some deeper insight to answer correctly.

It's not just testing reasoning, though, it's also testing fairly niche knowledge. I think a better test of pure reasoning would include all the rules and tips like "it's good to have some buffer" in the prompt.

link