Hacker News new | ask | show | jobs
by fieldcny 744 days ago
Ahh the ole if you can sense every particles position and velocity you can predict the future.

your comment really belies the desperation that exists now, these models are stuck where they are (hint it’s a natural limit), you are talking about exponentiation of cost to get what a 10% improvement? 5%? They are very few places for which it’s net positive to run them now, and most of those are incredibly shitty things like creating trash marketing content to drown us all in average inanity

I really feel bad for this next generation, they will just be constantly inundated with generated crap, so much of the high fidelity of conversation and meaning is and will be lost.

3 comments

Desperation? We have had huge advances just from 1.5 years ago with things I wouldn't have thought would be possible in near 10 years. After decades of research with far slower progress and all of sudden we now know that we have hit a wall?

And I am not talking about predicting the future, but more predicting the next action to take based on current state, sensor data in a more seamless way. Like a human being reacting to different input, by moving their muscles etc. There would be huge amount of training data from there that could be incorporated into a single model.

> And I am not talking about predicting the future, but more predicting the next action to take based on current state, sensor data in a more seamless way.

Like self-driving cars?

Self-driving cars is an engineering problem, let alone an AI problem, and we still cannot solve it despite trillion dollar economic incentives.

Just putting together some LLMs on a fuckton of data does not work. Tesla tried that, and failed.

Tesla has been using sparse data to train their models, because they needed to prioritize fast on device inference.

Completely different solution applied to a completely different problem with completely different risk and quality tolerances with completely different mitigations.

Self-driving cars don't use LLMs so the comparison is invalid, it's a different technology.
The point is to try and see if LLM's wide general knowledge can have advantage in something like sensory data + action learning as well. Current self driving models don't have that.
Actions typically consist of a series of small steps.

Given that LLMs are inaccurate around 5-10% of the time each step will compound the error rate until you are better off flipping a coin.

I don't understand this stretch logic. It absolutely depends on the type of problem where they are inaccurate, how well trained they are in it, there is no way you can extrapolate like this.

You can ask them to do math equation which takes steps and if they are trained in that for certain problems they are accurate near 100 percent of the time.

Like ask gpt-4o to solve different variations of

"""What is the answer to 2x + 7 = 31?"""

If the numbers are of similar magnitude and simplicity, it will follow the same steps and be right 99%+ times, and I'm only not saying 100%, because I haven't tried it enough, but I don't see it being wrong.

For example """What is the answer to 2x + 4 = -6?"""

Just run a test yourself. Do random integers within 0 - 20, it will definitely not be incorrect 5% - 10% time. It will be correct 99%+ time.

Where is this number 5% - 10% even coming from? You could also keep asking it "What is the capital of France?" and it's going to be right 99%+ of the time.

You are conflating asking a single question to ChatGPT versus AI agents which typically need to interact with an LLM multiple times.

And the 5-10% is on average and gets significantly worse as you expand the context length which is also something you want for an agent.

There weren't really any advancements from around 2018. The majority of the 'advancements' were in the amount of parameters, training data, and its applications. What was the GPT-3 to ChatGPT transition? It involved fine-tuning, using specifically crafted training data. What changed from GPT-3 to GPT-4? It was the increase in the number of parameters, improved training data, and the addition of another modality. From GPT-4 to GPT-40? There was more optimization and the introduction of a new modality. The only thing left that could further improve models is to add one more modality, which could be video or other sensory inputs, along with some optimization and more parameters. We are approaching diminishing returns.
> There weren't really any advancements from around 2018.

Not sure what that means. Why are you marking those as "quotes".

The last actions brought so many returns. And it's unknown what the exact effect would be in adding more modalities, training data, optimisations and even just plain parameters.

Text as training data can only get you so far. Giving real time sensory data from many fields could allow LLM like system to control robots and get even more data from real life. E.g. robot hand movements, object tracking data, all of that to be fed into LLMs, and see how it would work.

My understanding (roughly) is that the way we got here was kind of by surprise. We've had a lot of the fundamental algorithms for a long time, but we ran into sort of a happy surprise when transformer models got scaled up - suddenly they started doing interesting things. Scaling them up even more made them start do do potentially useful things.

That happy discovery was never really a linear improvement path, though. We had an explosion of capability, but all along there have been active questions about how far the improvements would go with the current approach.

I think the point that a lot of researchers are making is that that we're starting to see those limits (with LLMs, at least).

There are also a lot of questions around business model and cost/value prop. Training and running these things at scale is enormously expensive. I'm seeing a lot of FOMO and gold rush mentality in the space, similar to the online streaming wars, and I'm not convinced of the long term viability of a lot of the companies. Especially once open models like llama are "good enough" and become commodities.

Of course, it's still early days and there's a ton of room for discovery, but it looks like we'll hit a limit with the current approach pretty soon.

Personally, I'd be OK with that. With the current state of things we have an interesting toy that can sometimes do useful work. It's an incremental quality of life improvement and another good tool in the chest, but it's not a civilization impacting technology.

That's probably for the best.

My question still stands. How could anyone besides OpenAI be confident of there being a limit if no one has managed to even build as strong model so far as OpenAI has? Only Claude Opus seems close, but still weaker at reasoning than GPT-4o. Better at creative writing though.

And only after 1.5 years? And especially of we just had an happy surprise like you mentioned. How does it make sense to already start claiming that we have hit the limits. How do we know there is no more scaling, optimisations and happy surprises?

Quoting GP for context:

> That happy discovery was never really a linear improvement path, though. We had an explosion of capability, but all along there have been active questions about how far the improvements would go with the current approach.

> I think the point that a lot of researchers are making is that that we're starting to see those limits (with LLMs, at least).

The kinds of limitations we're "starting to see" are largely the same as they were a year ago. People were talking about it on here back then, but now it's becoming more apparent to more people as they get used to LLMs.

For those who saw it back then, this does look like we're hitting a limit. For others, not so much.

I'm not sure I understand this?

How do active questions about a technology imply we are approaching a brick wall?

How could researchers without having access to the latest state of the art - by OpenAI or any other unknown companies be able to even test that we could be approaching a brick wall? It seems to me that it would take trillions to find out what the exact limit is.

It's possible that we will get diminishing returns, but I don't see how we can confidently claim or know it?

> The kinds of limitations we're "starting to see" are largely the same as they were a year ago. People were talking about it on here back then, but now it's becoming more apparent to more people as they get used to LLMs.

I don't follow. GPT-3.5 was borderline useless at reasoning. But it still seemed amazing and what I wouldn't have thought to be possible in any near future.

And then GPT-4 was a crazy advancement over that to me. And I've been using it daily since it was available, for various use-cases. Are you saying we are seeing the limitations of GPT-4 specifically? Because, sure, GPT-4 is far from AGI, but I don't see how this implies that further scaling, optimisation, training data improvements, techniques like multi modality and other potential strategies that I might not be aware of couldn't bring another explosive step?

Also the fact that GPT-4 reasoning skill hasn't been reproduced by anyone else so far seems to leave me thinking that everyone except OpenAI are clueless. Claude Opus is close, like I've said before, but not quite GPT-4 levels in specific reasoning tasks that I'm using the API for.

If you can't reproduce GPT-4, how could we trust the assessment that we have hit a limit?

Isn’t the statement “being stuck” a bit like an attempt at predicting the future? You don’t know how long something will stay stuck…

I think a very common error when it comes to personal learning or progress is confusing a plateau with a brick wall. The reason is, unless you have already walked the path, it’s not possible to differentiate them. And when it comes to progress, no one has already walked the path, hence no one knows actually.

I think there's nuanced distinctions between:

- something that can't be modeled because there's no training data

- something that can't be modeled because it's fundamentally stochastic

- something that can't be modeled because the discrepancy in simulating the generating process, for your specific model, can, basically, be made arbitrarily large

why can fundamentally stochastic things not be modeled? monte carlo simulations were literally some of the first computer programs.
can't be modeled is probably not what I should have said, rather I meant more like the error for a globally optimal model is still high