Hacker News new | ask | show | jobs
by mianos 933 days ago
That has been covered quite well in multiple places. Andrej, as usual, has some good reasons in his recent talk: https://www.youtube.com/watch?v=zjkBMFhNj_g

Just increasing the size of pre-trained LLMs is not considered a likely simple path to AI by most professionals working in the technical side of the field.

3 comments

Thank you, that was a fascinating talk and I learned quite a bit.

However, it did not provide a convincing argument as to why LLMs cannot be a part of a "doomer" AI. In fact, I got the opposite vibe from Andrej explaining expected future developments. The whole section on System 2 thinking sounds like a layer constructed around dumb LLMs that would result in vastly improved and more generalizable intelligence.

I agree that just scaling the size of LLMs is probably not sufficient for AGI...but that just seems like one relatively minor piece of all the possible ways it might be achieved.

No argument from me. LLMs would be a component of AI, much like we kind of have long term read only memory. But the extra bits could in the form of some dynamic functions on each tensor network node (G* functions LOL!)

It's interesting to go back and read the MIT OpenAI story from 2020 to hear how thinking about these moral implications was an important part OpenAI.

Most professionals didn't think we were close to surpassing human capability in chess, go, or dota, until after it happened. I've seen little evidence of expert domain knowledge improving AI forecasting ability, if anything it seems the experts are often late to the party.

Besides expert consensus, is there any other actual argument against LLMs achieving generalizability?

> Besides expert consensus,

Well there are solid technical reasons, as described in the video. One of them is based on that these models are 'pre-trained' and AGI may be a result of a more dynamic knowledge base that can change more than just the local context and update the model, as our brain does.

Andrej also suggests that an attribute of a more advanced AI would have the ability to ask it to spend longer thinking to get a better answer, like a chess engine.

This said, expert consensus is probably the best answer we have. It's not like the consensus of a bunch of youtube vids and articles that only exist for getting clicks. These experts are famously sharp. I have done his course video series (it took a huge effort, even though he is an amazing lecturer) and had existing python and linear algebra experience and I understand his argument.

>these models are 'pre-trained' and AGI may be a result of a more dynamic knowledge base

Why couldn't the knowledge base be used in conjunction with the LLM? As the GP said, why can't LLM's gain sentience or be finagled into sentience with a wrapper'. The Knowledge base you're describing is the wrapper.

>Andrej also suggests that an attribute of a more advanced AI would have the ability to ask it to spend longer thinking to get a better answer, like a chess engine.

This is another method that is already being deployed with LLMs. So the question stands, why won't LLMs be the foundation for nearing AGI?

For my money, LLMs likely are that base. AI Experts are either too shy from the memory of AI winters past to see the nose on their faces, or too busy developing paradigm breaking models to care. Regardless of what Chomsky or any other 'expert' says should be possible, the practical results of LLM growth are literally speaking for themselves.

Maybe we should have suspected a 'large language game' to be the catalyst for AGI from the start. Was human intelligence truly general before we developed language? Could it be general without it?

I watched the talk and I don't saw him giving those reasons.