| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by loudmax 796 days ago
	To be fair, a lot of humans fail that. Including people that should know better.

1 comments

mjewkes 796 days ago

For sure. It's not a fair prompt at all. I'm super bullish on LLMs and am using GPT-4 in production right now. This stuff is magic.

It's actually hard to find short, simple, "plain english" failure cases like the above.

The "chain of reasoning" that the modern models deploy before the fail is funny too. This is GPT-4:

---

To determine the relationship between cherries and bananas based on your statements, let's break it down:

  1. Apples are better than bananas.
  2. Cherries are worse than apples.

From statement 1, we know apples rank higher than bananas. Statement 2 tells us cherries rank lower than apples. By this logic, since cherries are lower than apples, which are higher than bananas, it follows that cherries are also lower than bananas.

Therefore, based on these comparisons, cherries are not better than bananas.

link

mjewkes 796 days ago

Notably, if you ask it to transform the statements to formal logic, you get a correct response! This stuff is truly magic.

https://chat.openai.com/share/81e45fef-a72b-4258-98d6-5c8190...

link

anon373839 796 days ago

This makes sense to me. If you think about the training data, texts working through problems using formal predicate logic are likely to be correct, and much more likely to be precise about what information is (or isn’t) contained in the propositions. So if you formulate the problem in this language, you’re prompting the model to sample from patterns that are more likely to give you the result you want. Whereas if you use regular English, it could be sampling from cooking blogs or who knows what.

link