I agree, but with the caveat that it's getting harder and harder with all the hype / doom cycles and all the goalpost moving that's happening in this space.
IMO if you took gemini2.5 / claude / o3 and showed it to people from ten / twenty years ago, they'd say that it is unmistakably AGI.
There's no way to be sure in either case, but I suspect their impressions of the technology ten or twenty years ago would be not so different from my experience of first using LLMs a few years ago...
Which is to say complete amazement followed quickly by seeing all the many ways in which it absolutely falls flat on its face revealing the lack of actual thinking, which is a situation that hasn't fundamentally changed since then.
Yes, thar is the same feelingg I have.
Giving it some json and describe how a website should look? Super fast results and amazing capabilities.
Trying to get it to translate my unit tests from Xunit to Tunit, where the latter is new and does not have a ton of blog posts? Forget about it. The process is purely mechanical and it is easy after RTFM, but it falls flat on its face
Although I think if you asked people 20 years ago to describe a test for something AGI would do, they would be more likely to say “writing a poem” or “making art” than “turning Xunit code to Tunit”
IMO I think if you said to someone in the 90s “well we invented something that can tell jokes, make unique art, write stories and hold engaging conversations, although we haven’t yet reached AGI because it can’t transpile code accurately - I mean it can write full applications if you give it some vague requirements, but they have to be reasonably basic, like only the sort of thing a junior dev could write in a day it can write in 20 seconds, so not AGI” they would say “of course you have invented AGI, are you insane!!!”.
LLMs to me are still a technology of pure science fiction come to life before our eyes!
Tell them humans need to babysit it and doublecheck its answers to do anything since it isn't as reliable as a human then no they wouldn't call it an AGI back then either.
The whole point about AGI is that it is general like a human, if it has such glaring weaknesses as the current AI has it isn't AGI, it was the same back then. That an AGI can write a poem doesn't mean being able to write a poem makes it an AGI, its just an example the AI couldn't do 20 years ago.
Why do human programmers need code review then if they are intelligent?
And why can’t expert programmers deploy code without testing it? Surely they should just be able to write it perfectly first time without errors if they were actually intelligent.
Which part of "General Intelligence" requires replacing white collar workers? A middle schooler has general intelligence (they know about and can do a lot of things across a lot of different areas) but they likely can't replace white collar workers either. IMO GPT-3 was AGI, just a pretty crappy one.
> A middle schooler has general intelligence (they know about and can do a lot of things across a lot of different areas) but they likely can't replace white collar workers either.
Middle schoolers replace white collars workers all the time, it takes 10 years for them to do it but they can do it.
No current model can do the same since they aren't able to learn over time like a middle schooler.
Compared to someone who graduated middle school on November 30th, 2022 (2.5 years ago, would you say that today's gemini 2.5 pro has NOT gained intelligence faster?
I mean, if you're a CEO or middle manager and you have the choice of hiring this middle schooler for general office work, or today's gemini-2.5-pro, are you 100% saying the ex-middle-schooler is definitely going to give you best bang for your buck?
Assuming you can either pay them $100k a year, or spend the $100k on gemini inference.
> would you say that today's gemini 2.5 pro has NOT gained intelligence faster?
Gemini 2.5 pro the model has not gained any intelligence since it is a static model.
New models are not the models learning, it is humans creating new models. The models trained has access to all the same material and knowledge a middle schooler has as they go on to learn how to do a job, yet they fail to learn the job while the kid succeeds.
You and I could sit behind a keyboard, role-playing as the AI in a reverse Turing test, typing away furiously at the top of our game, and if you told someone that their job is to assess our performance (thinking they're interacting with a computer), they would still conclude that we are definitely not AGI.
This is a battle that can't be won at any point because it's a matter of faith for the forever-skeptic, not facts.
> Have you not experienced being on the recieving end of such accusations?
No, I have not been accused of being an AI. I have seen people who format their texts get accused due to the formatting sometimes, and thought people could accuse me for the same reason, but that doesn't count.
> I think this demonstrates the same point.
You can't detect general intelligence from a single message, so it doesn't really. People accuse you for being an AI based on the structure and word usage of your message, not the content of it.
> People accuse you for being an AI based on the structure and word usage of your message, not the content of it.
If that's the real cause, it is not the reason they give when making the accusation. Sometimes they object to the citations, sometimes the absence of them.
But it's fairly irrelevant, as they are, in fact, saying that real flesh-and-blood me doesn't pass their purity test for thinking.
Is that because they're not thinking? Doesn't matter — as @sebastiennight said: "This is a battle that can't be won at any point because it's a matter of faith for the forever-skeptic, not facts."
When it can replace a polite, diligent, experienced 120 IQ human in all tasks. So it has a consistent long-term narrative memory, doesn't "lose the plot" as you interact longer and longer with it, can pilot robots to do physical labor without much instruction (what is current state of the art is not that, a trained human will still do much better, can drive cars, etc), generate images without goofy non-human style errors, etc.
Indeed, on both. Even IQ 85 would make a painful dent in the economy via unemployment statistics. But the AI we have now is spikey, in ways that make it trip up over mistakes even slighly below average humans would not make, even though it can also do Maths Olympiad puzzles, the bar exam, leetcode, etc.
The emotional way that humans think when buying products is similarly unfair. Only the 90th percentile is truly 'satisfactory'. The implied question is when would Joe Average and everyone else stop moving the goal posts to the question, "Do we have AI yet"?
ASI is, by definition, Superintelligence, which means it is beyond practical human IQ capacity. So something like 200 IQ.
Again, you might call it 'unfair', but that's when it will also stop having goal posts being moved; otherwise, Joe Midwit will call it 'it's only as smart as some smart dudes I know'.
I still can't have an earnest conversation or bounce ideas off of any LLM - all of them seem to be a cross between a sentient encyclopedia and a constraint solver.
They might get more powerful but I feel like they're still missing something.
Why are you not able to have an earnest conversation with an LLM? What kind of ideas are you not able to bounce of LLMs? These seem to be the type of use cases where LLMs have generally shined for me.
Eh, I am torn on this. I had some great conversations on random questions or conceptual ideas, but also some where the models instructions shone through way too clearly. Like, when you ask something like "I’m working on the architecture of this system, can you let me know what you think and if there’s anything obvious to improve on?"—the model will always a) flatter me for my amazing concept, b) point out the especially laudable parts of it, and c) name a few obvious but not-really-relevant parts (e.g. "always be careful with secrets and passwords").
However, it will not actually point out higher level design improvements, or alternative solutions. It’s always just regurgitating what I’ve told it about. That is semi-useful, most of the time.
Because it spits out the most probable answer, which is based on endless copycat articles online written by marketers for C-level decision makers to sell their software.
AI doesn't go and read a book on best practices, then comes back saying "Now I know Kung Fu of Software Implementation" and then critically thinks looking at your plan step by step and provides answer. These systems, for now, don't work like that.
The "meaningless praise" part is basically American cultural norms trained into the model via RLHF. It can be largely negated with careful prompting, though.
I felt this way until I tried gemini 2.5. Imo it fully passes the turing test unless youre specifically utilizing tricks that LLMs are known to fall for.
I suspect everyone will call it a stochastic parrot because it did this one thing not right. And this will continue into the far far future even when it becomes sentient we will completely miss it.
Its generalization capabilities are a bit on the low side, and memory is relatively bad. But it is much more than just a parrot now, it can handle some of basic logic, but not follow given patterns correctly for novel problems.
I'd liken it to something like a bird, extremely good at specialized tasks but failing a lot of common ones unless repeatedly shown the solution. It's not a corvid or a parrot yet. Fails rather badly at detour tests.
It might be sentient already though. Someone needs to run a test if it can discern itself and another instance of itself in its own work.
People already share viral clips of AI recognising other AI, but I've not seen real scientific study of if this is due to a literary form of passing a mirror test, or if it's related to the way most models openly tell everyone they talk to that they're an AI.
I don't want to say any of these are exactly equivalent to any given aspect of human memory, but I would suggest that LLMs behave kinda like they have:
(1) Sensory memory in the form of a context window — and in this sense are wildly superhuman because for a human that's about one second, whereas an AI's context window is about as much text as a human goes through in a week (actually less because we don't only read, other sensory modalities do matter; but for scale: equivalent to what you read in a week)
(2) Short term memory in the form of attention heads — and in this sense are wildly superhuman, because humans pay attention to only 4–5 items whereas DeepSeek v3 defaults to 128.
(3) The training and fine-tuning process itself that allows these models to learn how to communicate with us. Not sure what that would count as. Learned skill? Operant conditioning? Long term memory? It can clearly pick up different writing styles, because it can be made to controllably output in different styles — but that's an "in principle" answer. None of Claude 3.7, o4-mini, DeepSeek r1, could actually identify the authorship of a (n=1) test passage I asked 4o to generate for me.
Similarity match. For that you need to understand reflexively how you think and write.
It's a fun test to give a person something they have written but do not remember. Most people can still spot it.
It's easier with images though. Especially a mirror.
For DallE, the test would be if it can discern its own work from human generated image.
Especially of you give it an imaginative task like drawing a representation of itself.
Well, I'm too lazy to look up how many weavers were displaced back then and that's why I said a lot. Maybe all of them, since they weren't trained to operate the new machines.
Anyway, sorry for a digression, my point is LLM replacing white collar workers doesn't necessarily imply it's generally intelligent -- it may but doesn't have to be.
Although if it gets to a point where companies are running dark office buildings (by analogy with dark factories) -- yes, it's AGI by then.
This is actually how a supreme court justice defined the test for obscenity.
> The phrase "I know it when I see it" was used in 1964 by United States Supreme Court Justice Potter Stewart to describe his threshold test for obscenity in Jacobellis v. Ohio
The reason why it's so famous though (and why some people tend to use it in a tongue in cheek manner) is because "you know it when you see it" is a hilariously unhelpful and capricious threshold, especially when coming from the Supreme Court. For rights which are so vital to the fabric of the country, the Supreme Court recommending we hinge free speech on—essentially—unquantifiable vibes is equal parts bizarre and out of character.
my 2c on this is that if you interact with any current llm enough you can mentally 'place' its behavior and responses. when we truly have AGI+/ASI my guess is that it will be like that old adage of blind men feeling & describing an elephant for the first time. we just wont be able to fully understand its responses. it would always be something left hanging and then eventually we'll just stop trying. that would be time when the exponential improvement really kicks in.
it should suffice to say we are nowhere near that and I dont even believe LLMs are the right architecture for that.
I agree, but with the caveat that it's getting harder and harder with all the hype / doom cycles and all the goalpost moving that's happening in this space.
IMO if you took gemini2.5 / claude / o3 and showed it to people from ten / twenty years ago, they'd say that it is unmistakably AGI.