Hacker News new | ask | show | jobs
by NitpickLawyer 307 days ago
To put things into perspective: DeepMind was founded in 2010, bought by goog in 2014, the year of this "prank". 11 years later and ... here we are.

Also, a look at how our expectations / goalposts are moving. In 2010, one of the first "presentations" given at Deepmind by Hassabis, had a few slides on AGI (from the movie/documentary "The Thinking Game"):

Quote from Shane Legg: "Our mission was to build an AGI - an artificial general intelligence, and so that means that we need a system which is general - it doesn't learn to do one specific thing. That's really key part of human intelligence, learn to do many many things".

Quote from Hassabis: "So, what is our mission? We summarise it as <Build the world's first general learning machine>. So we always stress the word general and learning here the key things."

And the key slide (that I think cements the difference between what AGI stood for then, vs. now):

AI - one task vs. AGI - many tasks

at human level intelligence.

----

I'm pretty sure that if we go by that definition, we're already there. I wish I'd have a magic time traveling machine, to see Legg and Hassabis in front of gemini2.5/o3/whatever top model today, trained on "next token prediction" and performing on so many different levels - gold at IMO, gold at IoI, playing chess, writing code, debugging code, "solving" NLP, etc. I'm curious if they'd think the same.

But having a slow ramp up, seeing small models get bigger, getting to play with gpt2, then gpt3, then chatgpt, I think it has changed our expectations and our views on what is truly AGI. And there's a bit of that famous quote "AI is everything that hasn't been done before"...

2 comments

Back in the 90s, Pixar put out a joke SIGGRAPH paper about rendering food with lots of food-related puns and so forth. In 2007 they released Ratatouille, which required them to actually develop new rendering techniques, especially around subsurface scattering, to make food look realistic and delicious.
I don't think what we have now fits that definition. LLMs are still narrowly good at language generation, and the "many" things it's good at are things that have canonical textual / linguistic representations (code, chess notation, etc.). Much of existing AI that appears more general is hooking up more specific models together; for example, taking the output of an LLM and piping it into a TTS . Since these pieces are easily replaceable I struggle to call it one AI that can do many tasks.

Consider that LLM->TTS example's human equivalent: when you're talking, you naturally emphasize certain words, and part of that is knowing not just what you want to say but why you want to say it. If you had a machine learning model where the speech module had insight into why the language model picked the words it has, and also vision so it knows who it's talking to to pick the right tone, and also the motor system had access to that too for gesturing, etc. then at that point you'd have a single AI that was indeed generally solving a large variety of tasks. We have a little bit of that for some domains but as it stands most of what we have are lots of specific models that we've got talking to each other and falling a little short of human level when the interface between them is incomplete.