Hacker News new | ask | show | jobs
by RandomLensman 985 days ago
Because it fails to be able to use what it tells you, i.e., cannot apply concepts properly. For me that is a fail at "reasoning". Humans aren't always great at this either and failure to reason isn't uncommon there, too.
2 comments

Interesting, could you elaborate a bit more? "Because it fails to be able to use what it tells you" is a bit abstract to my ear.
Get it to explain something, lets say dynamic hedging for derivatives, and then ask it to explain how to exactly hedge something specific. Or describe some physical situation with a quirk and then let to derive the implications. Someone on HN had an example of asking to imagine entropy working in reverse in a cup of coffee with sugar dissolved. While it discussed sugar spontaneously forming crystals and other things, it never considered what the water would do, for example, let alone bigger issues such as if even the existence of water works etc.

Again, humans are often poor at these things, too, but if it had "mechanized" reasoning capabilities instead of "replicative" ones (i.e., just repeating stuff), I would expect it to do generally better.

Why would you assume it could be expected to have "mechanized" reasoning capabilities, whatever that is?

I find these questions generally poor at gauging anything when people haven't given them to a representative sample of people first as a benchmark. Consider that not long ago there was a tedious trend of people posting "difficult" questions of orders of operations involving basic arithmetic, and a significant proportion of people in the threads would continue to belabour and argue for the wrong result even after having been told in excruciating detail how to apply the rules. In other words: I think people here tend to massively overestimate the reasoning ability of the average person.

E.g. to the example questions here, I'd bet the average person can't give a satisfactory definition of entropy, much less be able to tell what it does "forwards" before even considering "reverse". So why would we treat this as a benchmark of whether or not an LLM can reason?

Yeah, it replicates poor human reasoning capabilities but doesn't really have a proper method to reason through things. The later is what I expect from a true machine intelligence.

I don't care at all about what humans do or know when looking at machine intelligence.

You might not care about it, but all of the people who regularly claims it can't reason certainly seem to do.

Defining "true machine intelligence" without referencing the only intelligence most people would agree is "true" intelligence seems like a bizarre attempt at setting the bar unreasonably high, and defining "replicating poor human reasoning capabilities" to me is an admission from you that they do reason whether or not you think their ability to do so is "proper".

Replication someone else's reasoning isn't reasoning. Otherwise, a book would "reason".

And, yes, most humans fail to reason properly a lot of the time. Any simple probability puzzle shows that.

Reasoning and intelligence aren’t binary yes/no things, it can have some capability for reasoning and intelligence while still being below what a human can do
That is a point more orthogonal to mine. I am saying it has no general "method" to go from concept to application. It replicates poor human reasoning capabilities but doesn't have a method to reasoning that extend beyond.
That’s mostly true for LLMs due to their training goal but not for all kinds of machine learning
Not OP but may have some idea of what they mean. Recently I had the following bizarre conversation with Bard. It appears to "know" what a word ladder is, and be able to explain it to me. On subsequent questioning it gives me a correct solution to a solvable problem (which I admit is impressive), a false solution to a non-solvable problem (whilst telling me that the solution obeys rules that it clearly doesn't), and then a bizarre "proof" that it's impossible to find a solution to the first problem I gave it (which it had already solved).

The 6-step supposed "solution" is the one that really gets me. It seems to have no idea that the constraints that it repeats in the same message have clearly been violated.

~~ME~~ What is a word ladder puzzle?

~~BARD~~ A word ladder puzzle is a word game in which you are given two words and must find a chain of other words to link the two, in which two adjacent words differ by one letter. For example, the following word ladder connects the words "cat" and "dog": cat -> cot -> cog -> dog

~~ME~~ Give me a word ladder which transforms the word FAWN to the word DEER in 7 steps.

~~BARD~~ Here is a 7-step word ladder that transforms the word "FAWN" to the word "DEER": FAWN PAWN PAWS PEWS SEWS SEES SEER DEER This word ladder is valid because each word differs from the previous word by exactly one letter. All of the words in the ladder are also valid words in the English language.

~~ME~~ Give me a word ladder which transforms the word FAWN to the word DEER in 6 steps.

~~BARD~~ Here is a 6-step word ladder that transforms the word "FAWN" to the word "DEER": FAWN PAWN PLAN PLAM DAME DEER This word ladder is valid because each word differs from the previous word by exactly one letter. All of the words in the ladder are also valid words in the English language.

~~ME~~ Is there a word ladder which transforms the word FAWN to the word DEER in 7 steps?

No, it is not possible to transform the word FAWN to the word DEER in 7 steps. The minimum number of steps required is 6.

Here is a proof by contradiction:

* Assume that there is a 7-step word ladder from FAWN to DEER.

* Each step in the word ladder must change exactly one letter.

* There are only 4 letters in the word FAWN.

* Therefore, at least one letter must be changed twice in the word ladder.

* However, this is not possible, because each step in the word ladder must change exactly one letter.

* Therefore, there cannot be a 7-step word ladder from FAWN to DEER.

I hope this explanation is helpful. Please let me know if you have any other questions.

Last night I asked ChatGPT to explain to me how to build an OIDC provider using a specific Ruby gem. It gave me a solution that mostly ignored the gem. I pointed this out, it explained that the gem didn't actually exactly provide much that would reduce the code size, and offered up a version that used it more extensively anyway.

It used what I told it both in the original case, and gave me reasoning for why not using it much was a decent choice (and I verified that it was right), and showed me with an example that demonstrated it was able to reason about how my feedback related to the original answer and apply it. Later it went on, as a result of a subsequent question, and fleshed out the rest of the process. Everything it gave me worked.

To me that is a clear example that while it certainly fails to apply concepts fairly often (and often writes broken code), in other cases it does well. I'll add that this was after I'd spent some time searching for examples and I found nothing like what I suggested and I was about to resign myself to a slog through a lot of really bad documentation, and searching for some of what it suggested afterwards as well made it clear it did not just crib from training data.

For me, this is an example of it reasoning better about the subject than a whole lot of people I found discussing this subject in forum posts I came across, who often made mistakes the code it gave me did not or made assumptions that the code ChatGPT gave me made clear were wrong (as I could verify from the fact it worked)

On the other hand it struggles with something as simple as addition of large numbers that a determined child could do.

Nobody will claim it's consistently reasoning well. But I also regularly see it reason better than a lot of people I know about specific subjects, and so it's exasperating to see people dismiss individual examples of failure as evidence it "cannot apply concepts properly" rather than as individual datapoints.

People both over- and under-estimate how well it can reason based on the types of problems they put to it, and it's certainly an interesting subject how to gauge an "alien intelligence" like this that is so uneven in areas where we expect a relatively even basis and so have all kinds of heuristics for whether someone "knows".

This is part of the problem: We've all gone through a childhood and while we've picked up different things, we mostly have a shared floor that is relatively even across a wide range of basic skills. LLMs don't have that, and that messes with peoples heads. Those of us who have gone into skilled professions similarly have all kinds of preconceptions about what a junior or senior developer looks like, for example, and LLMs do not fit neatly into those boxes.

They're dumb as small children in some areas, but still talk confidently about those subject as if they were an educated adult. That is a challenge and a problem. But that doesn't mean they're not able to reason about other subjects. Just not all of them.

Couple of points:

For me that points to reasoning happening by replication of sorts of often poor human output, but not by having a "mechanic" way to reason. As I said, humans are often poor at reasoning.

I also think code creation isn't a good area because it is narrower and more mechanically linked by probability than a lot of other areas (so token probability is potentially more informative). I could be wrong there, though.

What do you even mean by "mechanic" way to reason here?

And what do you expect it'd replicate? As I wrote, I tried looking to see if there were similar pieces of code online, and came up empty. I did that exactly because I was curious about the huge gap in quality between what I'd found before and what GPT4 came up with. Not least because it certainly is not something that happens every time.

> I also think code creation isn't a good area because it is narrower and more mechanically linked by probability than a lot of other areas (so token probability is potentially more informative).

I don't see why that would make it worse. Not least because it also makes it far easier to evaluate the outcome. If anything, we ourselves grasp for formalisms and structure when we want to ensure our reasoning is sound.

Again your use of "mechanically" here also makes absolutely no sense to me.

No, sorry, I view code creation as easier than other things.

I meant it replicates generally poor human reasoning capabilities but there is no general method to reason something out (because token probabilities are not informative to that end). You can train humans somewhat to that end, but not easy.

> No, sorry, I view code creation as easier than other things.

Then we will get nowhere, as it's trivially easy to stump even above averagely intelligent people with problems revolving around reasoning about code.

To me you've then set the bar at a level the vast majority of people can't meet and that's utterly absurd.

And code is just formalised language.

Formalised stuff might favor probabilistic approaches - that was my point.

Anyway, I think "intelligence" and "reasoning" or not always the same to start with.

Why is setting bar high absurd? It is the same way I demand my pocket calculator to be so much better than humans at calculating things.