|
|
|
|
|
by mustacheemperor
1112 days ago
|
|
>I don't believe that for a second. This seems needlessly flippant and dismissive, especially when you could just crack open ChatGPT to verify, assuming you have plus or api access. I just did, and ChatGPT gave me a well-reasoned explanation that factored in the extra details about racing the other commenters noted. >There are many examples where GPT4 fails spectacularly at much simpler reasoning tasks. I pose it would be more productive conversation if you would share some of those examples, so we can all compare them to the rather impressive example the top comment shared. >I wouldn't trust GPT4 to tell me how much fuel I should put in my car. Would you? Not if I was trying to win a race, but I can see how this particular example is a useful way to gauge how an LLM handles a task that looks at first like a simple math problem but requires some deeper insight to answer correctly. |
|
It's not just testing reasoning, though, it's also testing fairly niche knowledge. I think a better test of pure reasoning would include all the rules and tips like "it's good to have some buffer" in the prompt.