Hacker News new | ask | show | jobs
by tangentstorm 988 days ago
I tested their first example in ChatGPT 4.

Interestingly, it gives the correct answer on the first try:

  https://chat.openai.com/share/d86fe16a-9dfd-4753-8eaf-6d2948096ea3
I then gave GPT4 a "chain-of-thought" flavored prompt, telling it to treat the problem like a geometry proof. It gave the same incorrect answer as GPT3 did in the paper. I then told it to "Review your work and check for mistakes." With this follow-up, it checked each line of the proof and was able to find and explain the error:

  https://chat.openai.com/share/c4ce6e98-43e3-4547-a4c8-380c1d1cc5fe

GPT3.5, given the same prompts, was still confident in its incorrect answer:

  https://chat.openai.com/share/1a0be419-092d-4dbb-a6c3-79e61914fd0d

(edited to update links and ask them each version to check its work twice)
1 comments

>> I then told it to "Review your work and check for mistakes." With this follow-up, it checked each line of the proof and was able to find and explain the error:

And after that? Did you tell it that it may be wrong and ask it to check again?

Good point. I did now. Both models stuck with their second answer.

(I see now I didn't know how to share ChatGPT links properly. Updating the links now...)