| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by redox99 1170 days ago

I don't know about GPT4, but GPT3.5 I'd bet is pretty traditional and boring. It's power comes from a really good, properly curated dataset (including the RLHF).

GPT3.5 turbo is much more interesting probably, because they seem to have found out how to make it much more efficient (some kind of distillation?).

GPT4 if I had to make a very rough guess, probably flash attention, 100% of the (useful) internet/books for it's dataset, and highly optimized hyperparameters.

I'd say with GPT4 they probably reached the limit of how big the dataset can be, because they are already using all the data that exists. Thus for GPT5 they'll have to scale in other ways.

4 comments

lexandstuff 1170 days ago

In this interview [1] with Ilya Sutskever, he indicates that they aren't even close to tapping out of data.

[1] https://www.youtube.com/watch?v=Yf1o0TQzry8&t=656s

link

jakeinspace 1170 days ago

To be fair, if the opposite were true, it might not be wise to admit. Saturating available high quality training data is one of the few ways anyone can see OpenAI slowing down.

link

tome 1170 days ago

Yes, it's a bit strange. I would have thought

1. They would already be using everything they can get 2. They would easily be able to explain what they're not using, without giving away sensitive secrets.

link

nr2x 1170 days ago

That interview is mind blowing stuff.

link

rcpt 1170 days ago

Yeah great interview style. It's non stop content.

link

fock 1170 days ago

I wonder if we saw the same video - or maybe it is just ChatGPT being "great" in the wild? I see one guy asking another guy simple questions and getting weaselwords for an answer.

link

revelio 1170 days ago

The interviewer and his questions is in some way more impressive than the answers, which is weird.

link

rcpt 1170 days ago

Yeah. He's fast and it's just question after question. I didn't realize that was something I wanted to see in interviews.

link

brendamn 1170 days ago

> I'd say with GPT4 they probably reached the limit of how big the dataset can be

I’m curious about this too; not just on the dataset size, but also the model size. My hunch is that the rapid improvements of the underlying model by making it bigger/giving it more data will slow, and there’ll be more focus on shrinking the models/other optimisations.

link

int_19h 1169 days ago

I don't think we're anywhere close to the limit of sheer hardware scalability on this. Returns are diminishing, but if GPT-4 (with its 8+ k context window) is any indication, even those diminishing returns are still very worthwhile.

If anything, I wonder if the actual limit that'll be hit first will be the global manufacturing capacity for relevant hardware. Check out the stock price of NVDA since last October.

link

flangola7 1170 days ago

According to financial reports they are building a $225 million supercomputer for AI. What we can probably expect is the same dataset with even more compute ran on it.

link

theturtletalks 1170 days ago

Is there a limit on how big the context size can be?

link

PeterisP 1170 days ago

There is a soft limit due to the computation required; the currently used model architectures are quadratic with respect to context size, so if you want ten times larger context size, that's going to need a hundred times more effort.

link

kickette 1170 days ago

There’s no theoretical limit

link