| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by galaxytachyon 1166 days ago

GPT is a breakthrough in many more ways than just being an advance LLM. GPT3 was released a year or so ago and technically, it was outclassed by PaLM from Google quite a bit in terms of parameter count and Chinchilla in terms of training. What was amazing was they managed to build a scalable system from it, capable of serving millions of users at the same time, and for free. The engineering and backend works must have been astounding and I argue that was the secret sauce for the success of ChatGPT.

They did not need to dumb it down or cut corners anywhere. The early releases of ChatGPT and Bing Chat showed they literally put unmodified SOTA models in the hand of users with no price tag attached. These AIs were known a long time ago but only to some people, remember how a bunch of billionaires suddenly got concerned about AIs a year or two ago? I bet they got early access to these LLMs. But only by scaling it up, they can explore the deeper depths of these models and discover new emergent abilities and realize actually how much progress they had made. Before people didn't really expect an LLM to play chess and simulate world models. Now they just found out these things are probably closer to AGI than they thought and the progress bar got pushed forward.

Basically my rant is that current progress was made over a long time and people just didn't really realize how far they have come until they opened it up to the public. I would not expect too many surprises in the future on the scale of ChatGPT again. If I am wrong though then we will actually start getting serious candidates for an AGI.

2 comments

whimsicalism 1166 days ago

> Basically my rant is that current progress was made over a long time and people just didn't really realize how far they have come until they opened it up to the public.

As someone in the field, I largely agree with this take but we're still talking about progress over the course of 5 years or so.

Also, fine-tuning/RLHF considerably advanced the usability of the models by the lay public and hasn't been around for that long.

link

jmerz 1166 days ago

It's crazy how fast humans normalize and adapt. Five years ago this stuff was science fiction, now we're arguing about how it's wrong sometimes.

We just kind of walked straight past the Turing test and nobody cares.

link

atleta 1161 days ago

I had a teacher at the university a very long time ago (well before even advanced AI computer vision was a thing, around the time DeepBlue beat Kasparov) and when he talked about AI and the future he warned us to take note that as long as machines cannot do something, we keep saying that you need to be intelligent to do it and as long as machines can do it we say that no intelligence is needed (because machines can do it). And as a result we may never admit that AI is actually intelligent or at least not until it's much better than us.

On a side note, I think the idea behind the first part of the phenomenon, i.e. that we think you need intelligence (and probably general intelligence) to do most things that machines cannot is that that is the way we do it. So until we could build machines that could calculate (say multiply and divide) we though you needed actual intelligence because we didn't know any other way and that is how we did it. (I remember that calculating and calculators were an actual example in that lesson.) Same thing for chess. And yes, we were right to say that you don't need general intelligence for neither chess nor calculating because we could come up with relatively simple algorithms.

But people a few years ago started to say this about go. And I do remember that after DeepBlue beating Kasparov everybody was like "yeah, but you need intelligence for go because that's a game with vastly more possible moves". And in a sense this is what we saw, because AlphaGo was indeed a kidn of AI, but still people started saying that you don't need intelligence to win go - as long as you are a machine.

Now people started to make up shit about how GPT is just generating text and that's not intelligent (some people, obviously ignorant laypeople, even say that it's just copy pasting text together and other nonsense). Despite that the freaking thing passes high level exams aimed at people. (Of course, you can say that it's still not a sign of intelligence, because the exams do not measure that, we know people are more or less intelligent, the exam measures knowledge, but the amazing thing here is not the knowledge part but being able to answer the questions aimed at people.)

link

sebzim4500 1166 days ago

>Basically my rant is that current progress was made over a long time and people just didn't really realize how far they have come until they opened it up to the public.

I'm sure that's true for lots of people but I doubt it's true for Hinton. He seems genuinely surprised by the success of large language models.

link

galaxytachyon 1166 days ago

Perhaps. But previous to ChatGPT, I doubt anyone fully know how capable an LLM can be. There is a plethora of papers published after ChatGPT went public detailing all the cool and unexpected abilities from an LLM, things that surprised even its creators.

Before, it was easy to see how good these things were at generating texts and paying attention to text based tasks. But what made them a topic in AGI discussions is how they generalize beyond text and reaching out to different domains of expression despite never getting trained on those. Things like chess or simulating emotions and manipulations spontaneously happened and it wasn't until now that we have documented such events. Nobody saw or expected that.

link

whimsicalism 1166 days ago

> Nobody saw or expected that.

I don't think this is true - at least not for people in the know in the last 3-4 years or so.

link

galaxytachyon 1166 days ago

That would be interesting. If they knew about it shouldn't they published something? Everyone would see the importance being the first with a paper about an AI that can do things it wasn't trained to do?

Was it just corporate secrets or they don't want bad press? The Lemoine incident at Google and how Bing Chat turned into an obsessive lover made me think even those who worked with these AIs didn't really consider the full capabilities of their systems.

link

whimsicalism 1166 days ago

I suppose it depends on what you mean by "an AI that can do things it wasn't trained to do"!

In some sense, the AI is still 'just doing what it was trained to do' in that it is 'just' predicting the next word. All examples of AI doing impressive behavior boil down to AI doing what it was trained to do (pick the next word [delta some RLHF tuning]), very well.

If you mean complex behavior arising out of what seems like a very simple unsupervised learning task, then this behavior has been known (although not to this scale) for a while.

For example, I distinctly remember being in my 2018 grad class on deep NLP and having a guest lecturer (Alec Radford) from OpenAI and they were demonstrating how their model got SOTA on summarization tasks just by taking the original text and appending the word "tl;dr" and using what the model produced after that token as the summary. It wasn't trained on a supervised summarization task, it just learned it incidentally from its unsupervised task.

The stuff we are observing is in the same vein as this, just even more impressive. But it is not unknown/completely unexpected behavior prior to ChatGPT.

Certainly it was already well known prior to this paper [0], which puts a lower limit on the timeline as at least 3 years ago.

> Was it just corporate secrets or they don't want bad press?

No, I think it was known & published about, although not as impressive as the most recent iterations. The press didn't realize this was a topic that interested people until 2022 (and really 2023).

[0]: https://arxiv.org/abs/2005.14165

link

galaxytachyon 1166 days ago

Thank you. That was informative.

link