Hacker News new | ask | show | jobs
by simon_000666 1172 days ago
ChatGTP/4 is to AGI what pepper’s ghost is to holography.

It’s a parlor trick, even if you add plugins or the ability to call other hugging face ML models - it’s just a parlor trick with fancier bells and whistles. All it is doing is using stochastic gradient descent to predict the next word in a sequence based on an enormous sophisticated training set designed to amaze people.

Thinking it has advanced because it can now get calculations correct is a fallacy. It’s still just predicting the next word, it’s just that it’s now got a post processing step that is converting those next words into code and parroting the output. It maybe be able to now answer 4567*9876 correctly (using the human hardcoded wolfram alpha engine) but it still does not fundamentally comprehend why 1+1=2 - like my 5 year old can.

Until it can generate its own internal neural networks to for example learn to logically reason about calculations we are still far from AGI. Also those calling for more data are misguided - less data, more sophisticated architectures than transformers are the only way to avoid the stochastic parrot trap.

10 comments

I find this so bizarre. Every time someone demonstrates a new way in which models are capable of a wider array of tasks than expected someone goes "it's just predicting tokens".

It's such a big "just". You are just firing neurons. The stock market is just supply and demand. The internet is just a bunch of computers talking through 50 year old protocols that don't work very well.

Everything is just something else! I wonder if the first tribe to be annihilated by bronze weapons were like "that stuff is just like stone but more malleable, don't see what the big deal is".

Stavros' law of AGI: If we know how it works, it's not true AGI.
Pepper’s ghost is also impressive when you see it for the first time. They’ve enhanced it do entire concerts now with dead music stars on stage for huge audiences. Has it helped us get any closer solve holography, will I be able to have a Star Trek style hologram roaming round my house because of pepper’s ghost?
It's not a big just. Saying it is AGI is an insanely huge claim. Don't flip it around and saying the skeptic is the one making a large claim. They aren't!
I asked chatGPT why it kept apologising and told it to not apologise to me.

Guess what, it apolgised immediately after and then again when I asked why it apologised even after I told it not to.

That’s pretty common in Japan, from what I’ve heard. Cultural upbringing is hard to distance yourself from.
Is chapgpt Japanese?
Guess what, I just saw one of those idiots from the bronzeworking tribe with a BENT sword. Imagine using weapons with blades that can get bent.
Except "this is just" is sprinkled all over NNs, DL and in turn of ChatGPT. Actually they pride themselves on "this is just".

So your argument is probably more accurate for the other camp, or at least as accurate for the other camp as well.

I'm not sure what you're getting at here but I'll try to respond. My argument is that "this is just" is meaningless as a way to assess the impact of a technology.

If AI researches say, "this is just X and it can do Y!" then fine, that's just framing for "look: Y". When stochastic parrot guys say "this is just X, what's impressive about that?" it throws me for a loop coz they are are refusing to engage with Y.

I think we disagree about what Y is. My point is that Y is not that different from materially what is possible with a slack bot from circa 2015. Essentially chatgtp is a less efficient way to get to the same outcomes that were already possible. The trick is that it appears to be something it’s not - AGI.

I like your bronze sword analogy. From my point of view chatgtp is not a bronze sword, it’s a Stone Age sword that someone has painted bronze. It has value because people realize the advantage that a true bronze sword would have in a battle. However, when you actually put it through it’s paces you quickly realise it offers no actual value over what came before.

>> It's still just predicting the next word

Predicting the next word is a much deeper problem than people like you realise. To be able to be good at predicting the next word you need to have an internal model of the reality that produced that next word.

GPT-4 might be trained at predicting the next word, but in that process it learns a very deep representation of our world. That explains how it has an intuition for colours despite never having seen colours. It explains why it knows how physical objects in the real world interact.

Now, if you disagree with this hypothesis it's very easy to disprove it by presenting a problem to GPT4 that is very easy for humans to solve but not for GPT4. Like the Yann Lecun gear problem, which GPT4 is also able to solve.

“To be able to be good at predicting the next word you need to have an internal model of the reality that produced that next word.”

Now that’s an interesting claim - that I would deeply dispute. It learns from text. Text itself is a model of reality. So chatgtp if anything proves that in order to be good at predicting the next word all you need is a good model of a model of reality. GTP knows nothing of actual reality only the statistics around symbol patterns that occur in text.

You are being given a chance to dispute it. Give an example of a problem that any human would be easily able to solve but GPT4 wouldn't.

>> "good model of a model of reality"

That is just a model of reality. Also, a "model of reality" is what you'd typically call a world model. Its an intuition for how the world works, how people behave, that apples fall from trees and that orange is more similar to red than it is to grey.

Your last line shows that you still have a superficial understanding of what its learning. Yes it is statistics, but even our understanding of the world is statistical. The equations we have in our head of how the world works are not exact, they're probabilistic. Humans know that "Apples fall from the _____" should be filled with 'tree' with a high probability because that's where apples grow. Yes, we have seen them grow there, whereas the AI model has only read about the growing on trees. But that distinction is moot because both the AI model and humans express their understanding in the same way. The assertion we're making is that to be able to predict the next word well, you need an internal world model. And GPT4 has learnt that world model well, despite not having sensory inputs.

Can chatgtp ride a bicycle? Can you ride a bicycle? If you ‘d never rode on a bicycle before - do you think if you read enough books on bicycle riding, the physics of bicycle riding, the physics of the universe - you would have anywhere near as complete a model of bicycle riding as someone who’d actually rode on a bicycle before. Sure you’d be able to talk a great game about riding bicycles - but when it comes to the crunch, you’d fall flat on your face. That’s because riding a bicycle involves a large number of incredibly complex emergent control phenomena embedded within the marvel of engineering that is the human body - not just the small part of the brain that handles language. So call me when LLM’s can convert their ‘world models’ learned from statistics on human language use into being able to ride a bicycle first time. Until then I feel comfortable in the knowledge they know virtually nothing of our objective reality.
Could Stephen Hawking ride a bicycle?
Yes, his mnd was diagnosed around the age of 21? And he didn’t learn to ride bicycles from reading books.
Your 5yo does not understand 1+1. You yourself do not understand it. Entire careers were spent trying to pin it down. It is basically its own branch of mathematics.

I understand your point, but I am struggling to see why it matters. This seems more and more an argument like “cars are not horses”. I know they are not but does it matter? Cars are superior for our use cases.

And while it may be true that it is far from AGI, I don’t think calling it a parlor trick does it justice. I used it this morning to set up a new workout routine for myself after having it write a little boilerplate typescript code to bootstrap 70% of a micro service I want to set up. My girlfriend who is studying react got a lot of value out of it by having compile errors explained to her. My mum uses it to practice English. I am going to integrate GPT-4 into a new product where it provides tangible value for non technical users. To be useful it does not need to be sentient or able to iterate on its own architecture.
Yeah I agree that’s fair, a parlor trick is perhaps a little harsh. ChatGTP can provide value - It’s arguable whether having done that ‘with classical’ methods could have been more efficient or whether the end result is as good - (btw careful with code - in my experience ChatGTP often thinks it knows what is wrong but is way off - something an experienced coder would notice immediately). Do you remember the tamagotchi? That also provided value to millions of people, many people thought of it as sentient even - was it? No - was it anywhere near AGI? No. If we can find good uses for the GTP models that were not possible or cost prohibitive before - then great. I think we just need to be clear - like the Tamagotchi - this is far from AGI and plugins/hugging face is not penultimate step before skynet.
Weird behaviour I’ve noticed is a lot of folks on the unimpressed/doomism side of AI consistently say GTP instead of GPT, I wonder why this pattern exists?
> less data, more sophisticated architectures

“The bitter lesson” would like to have a word. http://www.incompleteideas.net/IncIdeas/BitterLesson.html

I appreciate your enthusiasm, but the history of ML shows that your approach is less likely to work. Maybe you’ll be the one to prove everyone else wrong. Architectural breakthroughs are few and far between, and it’s incredibly difficult to reason about. I came up with the Lion optimizer while Google was using random tree search across 300 TPUs to discover the same thing, and it’s just five lines or so.

Is this some kind of copypasta? Too many tropes all at once. "GTP" on top of all this is too on the nose.
Ha I just commented above at the pattern of people in this camp using “GTP” fairly consistently.

What a curious psychological study, maybe dyslexic people feel more threatened by a large language model so clearly understanding words that they’re more likely to attempt to discredit it?

If it wasn't, it is now.
Evolution is just gene selection through natural selection. To create an eye is not possible

Well neural networks have unpredicted emergent properties. I don't see how anyone can rule out or know future behaviour

> It's still just predicting the next word.

Computer-generated random numbers are not truly random, yet they are practically random in most real-world use cases. You can’t easily cheat the RNG in World of Warcraft to get critical strike every time.

The output from GPT is generally very intelligent and versatile in terms of text. It may even be capable of handling more multi-modal problems with the use of enough sensors and motors. Perhaps the same idea of "predicting the next move" or "predicting the next idea" can still apply.

Who knows, maybe humans are essentially physical creatures that "generate the next thought and generate the next move"?

One of the biggest issues with GPT is its lack of mid-term memory like human do. Instead, we need vector store and search then bolt back its short term memory instead of letting it handle everything in a more coherent way. Perhaps it could benefit from lightweight fine-tuning technologies like LoRA and hypernetworks for stable diffusion. If this issue is resolved we would see it'll get even more practical. Again, the flaw is not about "predicting the next words".

I don't think whether it's AGI or not actually matters when it starts materially affecting the economy.
+10 Very well-said (and to-the-point).
Setting aside whether you're right or wrong about this... assuming you are right, then are you worried this will set everyone down the wrong path? That we'll spend ten years iterating on transformer models, never getting any closer to AGI? Is there another direction you think we should be moving toward instead (or at least simultaneously)?