Hacker News new | ask | show | jobs
by Aperocky 1265 days ago
GPT has no reasoning ability, it has billions of parameters that make it pretend it has it, purely going off of previously digested material.

As long as it comes across some reasoning process that have not been seen before in the training wordset, which can be as easy as a middle school math question, it fails. Because it has no ability to extrapolate logic.

If it manages to pass Bar test, that says more about the Bar test than it says about GPT.

7 comments

Most jobs today don't need novel reasoning. This is the equivalent of the steam machine for intelligence.

During the industrialization, machines did not replace all jobs, but they replaced or changed most jobs. The same will happen here.

A typical office job will have a few hours a week of actual, intensive thought. The vast majority of time will be spent doing simple, repetitive work. This work can be automated, or at least significantly sped up, using technology like GPT.

“write an API client for …”, “integrate APIs … and …” can easily be automated. Yes, you'll still have to write the business logic, but that's not the majority of your work today. You could even have it write unit tests based on the JIRA ticket description.

The same applies to many other jobs.

> You could even have it write unit tests based on the JIRA ticket description.

This is a wonderful point: writing unit tests is exactly the kind of mind-numbing tedium that I'm super excited to automate away.

> Most jobs today don't need novel reasoning. This is the equivalent of the steam machine for intelligence.

Like the point above; that says more about the work.

It’s going to be really interesting how the middle-class narrative pushes back on AI revealing how little work is actually done during office hours.

These boilerplate code can be and are automated away using deterministic frameworks. No need to introduce a blackbox and be responsible to debug the stuff it creates, which sounds far more painful than the alternatives.
That's true today, but think about all the work you do that takes basically no conscious effort, but is still not automated yet.

GPT can be of use there, as long as you're working with languages that use strict static types and have proper tests, it's easy to automate and ensure there are no mistakes.

You are implying either:

* Understanding complex language does not require logic/reasoning,

* There are infinitely many forms of logic/reasoning or at least more than those existing in a vast training set.

Neither of which is likely true.

What do you think of the Minerva system, which can solve multi-step quantitative reasoning questions better than many competent students and most adults?

https://ai.googleblog.com/2022/06/minerva-solving-quantitati...

Note: If you look at LSAT test samples, many questions are tests of complex logical reasoning, a requisite for legal professions.

You nailed what I find discomforting about these discussions. They’re incredibly narrowly focused on a specific implementation that satisfies hitherto unsolved problems by pointing out its doesn’t do already solved problems. But surely folks realize the human brain isn’t a single monolithic processing program but an ensemble of specialized subsystems that organize to form the mind. Why would you assume you wouldn’t do the same with AI systems? We’ve been tackling reasoning, inference, problem solving, information retrieval, mathematics, logic, and other domains for decades with some stupendous results. But they lacked the ability to ingest and translate language into some intermediate semantic form and take output and reconstruct it into a human language. Likewise vision, and audio processing and input output has been a struggle until recently.

I also really strongly disagree that it’s basically doing some sort of information retrieval design where based on language it regurgitates some sort of markov expectations. You can ask it to do very complex translations of a concept from one domain to another and expressed in a form that’s certainly never been done before and it does it with alacrity. At the very minimum it “remembers” things from the past in the conversation and can associate the semantic ideas across prompts and synthesize cogent responses - that in itself implies it has some semantic “understanding” of the structure of the language. That is a huge missing piece in our tool kit to date.

Frankly I feel these threads expose just how jaded and unable to dream we have become, that even when a wonder walks up and hits you in the nose we can’t even see it.

Language prediction model is not a closely guarded secret, I suggest looking into academic papers about what they are and maybe even see/do some implementation yourself.

There are no magic, it is just a more complicated transpose, created by training over perhaps 10% of all available text on the internet.

It does have a lot of use, for one I think it would probably put grammarly out of business, and maybe even do some work for law firms.

> Understanding complex language does not require logic/reasoning

The key is understanding. It does not need to, it has already seen the question asked in a 100 different ways, it also seen the answer to all of those. It just rephrases those answer via a neural network and that happen to pass the bar test.

> There are infinitely many forms of logic/reasoning or at least more than those existing in a vast training set.

More importantly, differences between forms are subtle and cannot be understood, that's why ChatGPT confidently give wrong answers on stackoverflow: https://meta.stackoverflow.com/questions/421831/temporary-po...

The LSAT tests formal logic. Some of it is complicated. Much less of it is required for the practice of law.

Src: scored 99.8th percentile on LSAT, tutored it, now working at major law firm

Also just adding to my earlier reply (can't edit), none of it is "complex" relative to the complexity of some of the concepts in computer science or more brainy parts of complicated professional software development.
Most people's reasoning ability functions at this level.
I would argue that it does not matter. The AI could even be "smarter" on pure IQ/reasoning, but in terms of practical reasoning that humans need that depends on exposure to real world, the AIs will still take decades to catch up.

The radiology AIs are technically more accurate than radiologists on any sufficiently large dataset, and yet they still have not replaced radiologists (or even are anywhere close to).

There’s an interesting sideroad here: A lot of medical jobs are there because of legal culpability: If someone fails the blame can be placed on them and not on the hospital who hired them. It might be a long time before they get to have that same protection with AI (probably miles and miles of contracts negotiated with a 3rd party who owns and operates the AI that’s used).
And in fact we want them to act outside a professional capacity in which they will be dangerous liabilities. ...Is not this piece about "Bar Exams"?
Which should be worrying. It's also why most people don't pass (nor try) the bar exam.
But let's be real. You can pass the bar exam with a lot of studying and memorizing (and many people do just that). A better test for whether the AI can do reasoning is some sort of puzzle which requires a creative solution.
Not really, this demonstrated how bar exam don't require a large amount of reasoning, but lots of recitations.

Humans are bad at reciting things, but a lot better (compared to GPT) at reasoning.

>As long as it comes across some reasoning process that have not been seen before in the training wordset, which can be as easy as a middle school math question

Is this true even if you tell it to show its working? In my experience that drastically improves its ability to do math problems.

At the end of the day it's still a language prediction model.

Which means whatever apparent logic you're getting out of it is from text that it has learned. not reasoning embedded within those text, but the actual text itself.

It can't solve quadratic equations.
Prompt: solve x^2 + 11x + 30 = 0. Show your working

Answer: To solve this equation, we can use the quadratic formula:

x = (-b +/- sqrt(b^2 - 4ac)) / (2a)

Plugging in the values for a, b, and c, we get:

x = (-11 +/- sqrt(11^2 - 4130)) / (2*1)

x = (-11 +/- sqrt(121 - 120)) / 2

x = (-11 +/- sqrt(1)) / 2

x = (-11 +/- 1) / 2

x = -5 or x = -6

So the solutions to the equation are x = -5 and x = -6.

That’s true of almost all humans too. You’re self selecting for the elite you belong to.
Largely yes. Some prompts can make it better or worse.
Until someone can point out the difference between neuronal reasoning and silicon reasoning, I remain completely agnostic about the underlying mechanics of whatever model.

Gun to my head where I had to put money down, I would put it on "Brains are not nearly as special as we (they?) think they are." No fairy dust or supernatural beings required, brains are just another AI model (and likely not even a particularly great one).

Human brains helped humans survive for a long time.

An AI that survives that long surely has to be great. Probably you meant that human brains are not made for the world of today.

I've already been using GPT and ChatGPT to much success for my work.

Yes, it doesn't have reasoning ability, but being able to manage knowledge and information in the way that these models can is still an amazing feat.

It does have some ability to extrapolate to new problems, provided its training corpus has reasonably close coverage. It is not going to be making new scientific discoveries or insights but then neither are most people. With a sufficiently large training set I think these models can achieve human parity for a subset of language generation tasks, and be effectively of human intelligence. They nearly already have.

It doesn’t matter to me if they have “reasoning” capabilities or not if the outcome is the same.

I think we are a long ways off from AGI still.

> a subset of language generation tasks ... if the outcome is the same

Which tasks? The output of some crafter with a limited number of modules, and of somebody who can assess the output, cannot be the same - unless you would have accepted the output of a mentally crippled entity in the first place.