| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by civilized 1160 days ago

I've done several experiments (and posted results in previous HN comments) where I've given GPT puzzles or brainteasers and asked it to review aspects of its answers Socratically. Never telling it it got anything wrong, just "you said A, then you said B, does that make sense"?

It usually does notice inconsistencies between A and B when asked this. But its ways of reconciling inconsistencies can be bizarre and suggest a very superficial understanding of concepts.

For example, it once reconciled an inconsistency by saying that, yes, 2 * 2 = 4, but if you multiply both sides of that equation by a big number, that's no longer true.

I will be super impressed the day we have a model that can read an arithmetic textbook and come out with reliable arithmetic skills.

4 comments

faizshah 1160 days ago

I have run into the same issue when using it for coding. It can easily debug simple code but for libraries like Bazel I went down a rabbit hole for 2 hours of letting it debug an error and failing every time even with chain of thought it had a very shallow understanding of the issue. Eventually I had to debug it myself.

link

RheingoldRiver 1160 days ago

> For example, it once reconciled an inconsistency by saying that, yes, 2 * 2 = 4, but if you multiply both sides of that equation by a big number, that's no longer true.

Fair enough, have you explained it the axioms of arithmetic? It only has memorized examples that it has seen, it has a right to be skeptical until it's seen our axioms and proofs about what is always true in mathematics.

When I was a child I was skeptical that an odd number + an even number is always odd etc for very large numbers until I saw it proven to me by induction (when I was 6, I think, imo this was reasonable skepticism).

Now, ChatGPT probably has seen these proofs, to be fair, but it may not be connecting the dots well enough yet. I would expect this in a later version that has been specifically trained to understand math (by which I really mean math, and not just performing calculations. And, imagine what things will prove for us then!)

link

civilized 1160 days ago

I think GPT has read about as many textbooks on arithmetic as I have, and the difference between us is entirely in the intelligence to absorb the contents and apply them logically with consistent adherence to the rules.

I think one problem with these models is that all their knowledge is soft. They never learn true, universal rules. They seem to know the rules of grammar, but only because they stick to average-sounding text, and the average text is grammatical. At the edges of the distribution of what they've seen, where the data is thin, they have no rules for how to operate, and their facade of intelligence quickly falls apart.

People can reliably add numbers they've never seen before. The idea that it would matter whether the number has been seen before seems ridiculous and fundamentally off-track, doesn't it? But for GPT, it's a crapshoot, and it gets worse the farther it gets away from stuff it's seen before.

link

sharemywin 1160 days ago

in computer logic you would get an undefined if the number was large enough.

link

civilized 1160 days ago

It doesn't work with numbers as computer numbers though. It works with them as decimal digit strings, just like humans do.

link

Paul-Craft 1160 days ago

Make the number you multiply by essentially the concatenation of a long series of random digits, and I can just about guarantee most humans will get different things on both sides, because they'll make one or more mistakes doing the math. That is, of course, assuming the humans don't have suitable traditional computer tools capable of handling such a scenario.

link

civilized 1159 days ago

Not sure how this is relevant to the discussion.

link

Paul-Craft 1159 days ago

You don't see how asking humans to multiply both sides of 2 * 2 = 4 by the same, very large, random-ish number, and expecting that they'll get different things is relevant to this:

> 2 * 2 = 4, but if you multiply both sides of that equation by a big number, that's no longer true.

You know, the very same scenario I pulled from your comment?

link

civilized 1159 days ago

It's not the same issue. I was talking to GPT about the strings 2 * 2 * x and 4 * x, not the multiplied-out versions.

link

int_19h 1159 days ago

Was it GPT-3.5 or GPT-4?

link

civilized 1159 days ago

GPT-3.5. People keep telling me GPT-4 is so much better, but I don't know where I can access it for free and I'm not interested in paying for it.

But if anyone wants to give it to me for free, I would happily make a $1000 bet that I can get GPT-4 to make the same mistake.

link

int_19h 1159 days ago

There's no free tier that I know of. But, yes, it is drastically better, and it's specifically much less prone to hallucinate "proofs" that the previous answer is correct if you challenge it.

If you provide the inputs for some specific task where you expect GPT-4 to fail in this manner, I can give it a try.

link