|
It's not good at reasoning. I asked the following (the term substitutions are there to avoid parroting): Let's define the following terms:
col means "and", pel means "or" and erf is equivalent to "col" if the first argument evaluates to "true", otherwise it's equivalent to "pel"
can you evaluate the following statement?
col(true, pel(false, true), erf(col(true, false), true))
And it said this: ...
Next, we look at the expression "erf(col(true, false), true)". Since the first argument is true, erf is equivalent to col, which means we evaluate the entire expression as:
col(true, false) = false
Therefore,
erf(col(true, false), true) = col(true, true) = true
which is obviously wrong because the first arg is false. The final answer happens to be the same. So I then asked what about col(true, pel(false, true), erf(col(true, false), false))
and this is its full answer: Sure, let's evaluate the expression "col(true, pel(false, true), erf(col(true, false), false))" step by step:
First, we look at the expression "pel(false, true)". This is an "or" operation, so we evaluate it as follows:
pel(false, true) = true
Next, we look at the expression "erf(col(true, false), false)". Since the first argument is true, erf is equivalent to col, which means we evaluate the entire expression as:
col(true, false) = false
Therefore,
erf(col(true, false), false) = pel(false, false) = false
Finally, we look at the expression "col(true, pel(false, true), erf(col(true, false), false))". Using the results from steps 1 and 2, we can evaluate this as:
col(true, true, false) = true
Therefore, the overall value of the statement "col(true, pel(false, true), erf(col(true, false), false))" is true.
You can see it's mostly random nonsense masked as "reasoning". It's not even consistent across sentences. I assume if you do a bigger model, it might do better, but as soon as you increase the example size I suspect it will get it wrong again. |
Keep in mind that LLMs like GPT 4 do not have a scratchpad and can't back-track. Asking it a question is the rough equivalent of asking a human standing in the middle of the room a question. No paper, no whiteboard, nothing. Just a demand for a verbal answer then and there, from a standing start.
I've only met a couple of humans who would be able to solve this problem like this one under those circumstances.
Despite its impediments GPT 4 solved this correctly:
Alternatively, to give an idea of the upcoming "Wolfram Alpha" plugin, you can instead ask it to convert the expression into the equivalent Mathematica program: Very nice! Unfortunately, it didn't notice that "col" is used with three arguments, not just two. This is a mistake many (most?) humans would make also.No worries, you can ask it to correct that. It'll apologise and produce this:
Which evaluates to: There you go. Now go find me a random human who can do any of this from a standing start, no preparation, no warning, no tools, nothing but their wits.As a bonus, here's a related joke: