| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by MakeAJiraTicket 116 days ago
	Thank you! Gemini has consistently been the best performer that I've tried, but they always require the connection to be made explicit. The point of the test is that it is very low complexity and is very targeted toward what can be considered reasoning and these models can't produce the connection without prodding. In the ideal case of reasoning you would simply present the methods and they'd bridge the gap independently when it is brought to the forefront of their context together, but it doesn't happen.

1 comments

mapontosevenths 116 days ago

ChatGPT got it with less prodding, but I had to set it to "Pro" thinking mode (ChatGPT's version of Deep Think, I suspect). I'm sure Deep Think could get it with even less prompting.

I think your conclusion that they aren't really thinking doesn't hold. They're already there, it just costs more and time to get good results.

https://chatgpt.com/share/69a12666-64b0-8009-8dfe-59546ac400...

EDIT - Updated the link to include the full conversation. Note that I didn't change it to pro mode until the end, and eventually got tired of waiting and just told it "answer now."

link

MakeAJiraTicket 116 days ago

This is the expected result. "Do you see the connection?" is where it failed to actually bridge the connection. I don't know if pro mode is relevant, but they require someone prodding from the perspective of already knowing the invention to reach it themselves.

They capture the gestalt of reasoning, they can reason in patterns that we encoded with language, but they can't do genuine reasoning.

link

mapontosevenths 116 days ago

I'm not sure a lack of intuition implies a lack of reasoning. They clearly didn't make that jump until they were told to look for something, but did pretty handily once asked. Clearly they used some version of reasoning to do that, but just as clearly had no interest in it at all until directed to look for it.

I wonder if we phrased it differently we could get them to make the leap without so much hinting? I'll tinker with it a bit later.

Either way, very cool experiment! Thanks for posting it. I'd upvote you again if I could.

link