| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Jensson 832 days ago

I think he disagrees with 4:

4. Language prediction training will not get stuck in a local optimum.

Most previous things we train on could have been better served if the model developed AGI, but they didn't. There is no reason to expect LLMs to not get stuck in a local optimum as well, and I have seen no good argument as to why they wouldn't get stuck like everything else we tried.

2 comments

sigmoid10 832 days ago

There is very little in terms of rigorous mathematics on the theoretical side of this. All we have are empirics, but everything we have seen so far points to the fact that more compute equals more capabilities. That's what they are referring to in the blog post. This is particularly true for the current generation of models, but if you look at the whole history of modern computing, the law roughly holds up over the last century. Following this trend, we can extrapolate that we will reach computers with raw compute power similar to the human brain for under $1000 within the next two decades.

link

leereeves 832 days ago

More compute also requires more data - scaling equally with model size, according to the Chinchilla paper.

How much more data is available that hasn't already been swept up by AI companies?

And will that data continue to be available as laws change to protect copyright holders from AI companies?

link

sigmoid10 832 days ago

It's not just the volume of original data that matters here. From empirics we know performance scales roughly like (model parameters)*(training data)*(epochs). If you increase any one of those, you can be certain to improve your model. In the short term, training data volume and quality has given a lot of improvements (especially recently), but in the long run it was always model size and total time spent training that saw improvements. In other words: It doesn't matter how you allocate your extra compute budget as long as you spend it.

link

leereeves 832 days ago

In smaller models, not having enough training data for the model size leads to overfitting. The model predicts the training data better than ever, but generalizes poorly and performs worse on new inputs.

Is there any reason to think the same thing wouldn't happen in billion parameter LLMs?

link

sigmoid10 832 days ago

This happens in smaller models because you reach parameter saturation very quickly. In modern LLMs and with current datasets, it is very hard to even reach this point, because the total compute time boils down to just a handful of epochs (sometimes even less than one). It would take tremendous resources and time to overtrain GPT4 in the same way you would overtrain convnets from the last decade.

link

Davidzheng 832 days ago

True but also from general theory you should expect any function approximator to exhibit intelligence when exposed to enough data points from humans, the only question is the speed of convergence. In that sense we do have a guarantee that it will reach human ability

link

sigmoid10 832 days ago

It's a bit more complicated than that. Your argument is essentially the universal approximation theorem applied to perceptrons with one hidden layer. Yes, such a model can approximate any algorithm to arbitrary precision (which by extension includes the human mind), but it is not computationally efficient. That's why people came up with things like convolution or the transformer. For these architectures it is much harder to say where the limits are, because the mathematical analysis of their basic properties is infinitely more complex.

link

samatman 831 days ago

LLMs aren't improving at things they're unable to do at all. An example being reasoning.

link

olalonde 831 days ago

LLMs can reason. You can verify this empirically by asking questions which require reasoning to e.g. GPT-4.

link

lossolo 831 days ago

There is evidence that they actually can't reason:

https://arxiv.org/abs/2311.00871

https://arxiv.org/abs/2309.13638

https://arxiv.org/abs/2311.09247

https://arxiv.org/abs/2305.18654

https://arxiv.org/abs/2309.01809

link

samatman 831 days ago

False. https://arxiv.org/abs/2308.03762

link

olalonde 831 days ago

This is not peer reviewed research and has some serious issues: https://news.ycombinator.com/item?id=37051450

link

Zambyte 832 days ago

It sounds like you're arguing against LLMs as AGI, which we're on the same page about.

link