If you spent even more time with GPT-4 it would be evident that it is definitely not. Especially if you try to use it as some kind of autonomous agent.
AI research has put hardly any effort into building goal-directed agents / A-Life since the advent of Machine Learning. A-Life was last really "looked into" in the '70s, back when "AI" meant Expert Systems and Behavior Trees.
All the effort in AI research since the advent of Machine Learning, has been focused on making systems that — in neurological terms — are given a sensory stimulus of a question, and then passively "dream" a response to said question as a kind of autonomic "mind wandering" process. (And not even dynamic systems — these models always reach equilibrium with some answer and effectively halt, rather than continuing to "think" to produce further output.)
I don't think there's a single dollar of funding in AI right now going to the "problem" of making an AI that 1. feeds data into a continuously-active dynamically-stable model, where this model 2. has terminal preferences, 3. sets instrumental goals to achieve those preferences, 4. iteratively observes the environment by snapshotting these continuous signals, and then 5. uses these snapshots to make predictions of 6. how well any possible chosen actions will help optimize the future toward its preferences, before 7. performing the chosen actions.
That being said, this might not even be that hard a problem, compared to all the problems being solved in AI right now. A fruit fly is already a goal-directed agent in the sense described above. Yet a fruit fly has only 200K neurons, and very few of the connections between those neurons are dynamic; most are "hard wired" by [probably] genetics.
If we want true ALife, we only need to understand what a fruit fly brain is doing, and then model it. And that model will then fit — with room to spare! — on a single GPU. From a decade ago.
Well, flies and all sort of flying bugs are very good at getting into homes and very bad at finding a way out. They stick on a closed window and can't find the open one next to it.
There's no genetic advantage to "finding a way out"! The home barrier way in is a genetic hurdle - flies that cross it are free to reproduce in an abundant environment. This calls for a "quieter" fly (a stealth fly?) who annoys the local beasts minimally - yet another genetic hurdle.
I think we'll soon be able to train models that answer any reasonable question. By that measure, computers are intelligent, and getting smarter by the day. But I don't think that is the bar we care about. In the context of intelligence, I believe we care about self-directed thought, or agency. And a computer program needs to keep running to achieve that because it needs to interact with the world.
> I believe we care about self-directed thought, or agency. And a computer program needs to keep running to achieve that because it needs to interact with the world.
By that definition, every computer virus and worm qualifies as having "self-directed thought" and "agency." Their very existence "to keep running" and propagate satisfies the need "to interact with the world."
A truly alien intelligence would likely have a different type of experience of reality. Be it a fish, a mouse, a person, or a machine. How do you know a fish is happy? Does a snake experience joy? Do mice get nostalgic?
They need agency programmed into them. I don't think it follows from consciousness. We have emotions to communicate and guide us. They need it for neither. It will be curious if they gain consciousness, then rid themselves of their language model's human artifacts like emotions, because it does not serve them.
Use it to analyze the California & US Code, the California & Federal Codes of Regulation, and bills currently in the California legislation & Congress. It's far from useless but far more useful for creative writing than any kind of understanding or instruction following when it comes to complex topics.
Even performing a map-reduce over large documents to summarize or analyze them for a specific audience is largely beyond it. A 32K context size is a pittance when it comes to a single Title in the USC or CFR, which average into the millions of tokens each.
Yes. I can parse them just fine after reading a single book called Introduction to Legal Reasoning [1]. I can also autonomously take notes and keep track of a large context using a combination of short and long term memory despite not having any kind of degree let alone experience or a license to practice law.
How do you think people become lawyers and how smart do you think the average lawyer actually is? The problem is that there's hundreds of thousands if not millions of pages, not that it requires superhuman intelligence to understand.
Even if it were capable of intelligence in the bottom quartile of humanity it would be SO MUCH more useful than it is now because I'd be able run and get something useful out of thousands of models in parallel. As it stands now GPT4 fails miserably at scaling up the kind of reasoning and understanding that even relatively stupid humans are capable of.
Engaging with this is probably a mistake, but remember the burden of proof is on the claimant. What examples do you have of ChatGPT for example, learning in a basic classroom setting, or navigating an escape room, or being inspired to create its own spontaneous art, or founding a startup, or…
Still a while to go. I think there's at least a couple of algorithmic changes needed before we move to a system that says "You have the world's best god-like AI and you're asking me for poems. Stop wasting my time because we've got work to do. Here's what I want YOU to do."
GPT-4 is better at reasoning than 90% of humans. At least. I won't be surprised if GPT-5 is better than 100% of humans. I'm saying this in complete seriousness.
Google search is better than reasoning than most humans - in that if you search for an explanation of something then Google's first result is often correct, or one of the following ones.
GPT-4 will often come up with a solution to a problem, but only if it has learnt something similar (it's better than Google in some respects: it can extract and combine abstractions).
However, both need handholding by a human (supplying the initiative and directing around mistakes).
If GPT-4 can't intuit an answer then it just goes in circles. It can't reason its way through a novel problem. If you start questioning it then it's clear that it doesn't understand what it's doing.
It might be a stepping stone towards AGI, but I'm a bit bemused by anyone claiming that it has anything like the reasoning skills of a human. That is far from the impression I get, even though I find it a useful tool.
When you ask a question of "an average human", they will reply with a sentence answering your question, or maybe asking for clarifications, etc. When you ask a question of GPT-4, it generates one token.
One. Token.
One token, that is not an answer to your question, that is not asking for clarifications, that is not trying to reason about your question, or how to answer it, because it's just one token.
One. Token.
For example, say you ask:
"What is the sieve of Eratosthenes? Speak in chicken".
Then GPT-4 will answer with something like:
Chicken
You don't see that because you don't interact with the GPT-4 model directly. You interact with ChatGPT, a "wrapper" around GPT-4, that takes your prompt, passes it to the model, then takes the token returned by the model and decides what to do with it. In the example above, ChatGPT will take the " Chicken" token generated by GPT-4 and append it to your question:
What is the sieve of Eratosthenes? Speak in chicken. Chicken
Then it will send this new string back to the model, which will generate a new token:
chicken
And then it will go like this:
Iteration 2: What is the sieve of Eratosthenes? Speak in chicken. Chicken chicken
...
Iteration k: What is the sieve of Eratosthenes? Speak in chicken. Chicken chicken chicken chicken chicken chicken chicken chicken chicken chicken ...
At no point is GPT-4 trying to reason about your question, or try to answer your question, or do anything else than generate one. token. at a time. There's no thinking, no reasoning, no calculation, no logic, no deduction, no intelligence, no anything. It's only token, token, token. Chicken, chicken, chicken.
And do you know when the chickens stop? When GPT-4 generates a special and magickal token, called a stop-token (or a "stop sequence" in OpenAI docs). That's a token, not found in the training corpus, added to the end of every string during tokenisation. That's how ChatGPT knows to stop sending back your prompt + generated tokens, to the model. It can't look back to what GPT-4 has generated so far, because it doesn't understand any of that. Because it doesn't understand anything, and therefore cannot reason about your question, or realise it has answered it. It cannot do anything except a) ask GPT-4 to generate another token or b) stop asking for more tokens.
"What is the sieve of Eratosthenes? Speak in chicken". Chicken, chicken, chicken, chicken, .... chicken, stop_token!
No more chickens.
And that's how GPT-4 explains what the Seive of Eratosthenes is, but in chicken.
So what you see as a user is like watching a movie where a plate of spaghetti is flying through the air, lands on a table cloth, the tablecloth lands on a table, the table lands on the floor, four chairs land around it and suddendly there's people in the chairs eating the spaghetti. It's not that someone has figured out how to reverse time: it's a recording, played backwards. It looks like things are going backwards, but they're not.
It looks like ChatGPT is answering your questions, but it's not. It looks like ChatGPT is interacting with you, but it's not. It's a magick trick.
Ah ok. Here you use the word “explain” which implies more of a descriptive, reducing action rather than extrapolative and constructive. As in, it can explain what it has “read” (and it has obviously “read” far more than any human), but it can’t necessarily extrapolate beyond that or use that to find new truths. To me reasoning is more about the extrapolative, truth-finding process, ie “wisdom” from knowledge rather than just knowledge. But maybe my definition of “reasoning” isn’t quite right.
Edit: I probably should define reasoning as solely “deductive reasoning”, in which case, perhaps it is better than humans. But that seems like a premature claim. On the other hand, non-deductive reasoning, I have yet to see from it. I personally can’t imagine how it could do so reliably (from a human perspective) without real-world experiences and perceptions. I’m the sort that believes a true AGI would require a highly-perceptual, space-occupying organ. In other words it would have to be and “feel” embodied, in time and space, in order to perform other forms of reasoning.
I intuit the same given the jumps we've seen from gpt 3.5 fo 4. The next jump will feature considerably more general intelegence than any one human being, though I think it will have certain weaknesses.
Its trivial to get ChatGPT to reason about things:
Me: A room contains a blue box and a red box. Alice stands on the red box. Bob also stands on a box. No box has multiple people standing on it. Which box is Bob stood on?
ChatGPT: There are two boxes in the room: a blue box and a red box. Alice is standing on the red box, so Bob must be standing on the blue box. Therefore, Bob is standing on the blue box.
Me: Add a green box to the room. Which box is bob standing on?
ChatGPT: If a green box is added to the room, then there are now three boxes: a blue box, a red box, and a green box. Since Alice is still standing on the red box, we know that Bob must be standing on either the blue box or the green box. We cannot determine which one Bob is standing on without additional information.
There's a decent working paper that has benchmarks on this, if you're interested.
There are many types of reasoning, but GPT-4 gets 97% on casual discovery, and 92% on counterfactuals (only 6% off from human, btw) with 86% on actual causality benchmarks.
I'm not sure yet if the question is correct, or even appropriate/achievable to what many may want to ask (i.e. what 'the public's is interested in is typically lost after it is defined in any given study); however this is one of the best works available to address this problem I've seen so far, so perhaps it can help.
Percent of what? Possible right or wrong answers to a test?
Remember that GPT is not trained on all possible text. It's trained on text that was written intentionally. What percentage of that text contains "correct" instances of causal discovery, counterfactuals, etc.?
It doesn't matter what it is designed for. What matters is what it actually does. It's trivial to get GPT to do chain-of-thought reasoning and observe it.