Hacker News new | ask | show | jobs
by namedlambda 1760 days ago
Everything in the above comment is wrong.

In AlphaZero for example, there were 44 million training games total for 700,000 steps of training for the full 9 hours.

Turning that to human-scale numbers, 44million games with on average 60 moves, at 1 second thinking time per move,

> 44000000*60/60/60/24/365 = 83,7138508371 years of training experience in 9 hours

The whole field of Reinforcement learning has agents training and playing games for many orders of magnitude more time than a human ever will. In-fact, we can scale this to over 100k of actions per second, in a single machine:

https://github.com/alex-petrenko/sample-factory

Then, there is also distributed Reinforcement Learning, where hundreds of agents can play at different machines and share experience, see AlphaZero, LeelaZero, R2D2 agent, R2D3 agent, Apex, Acer, Asynchronous PPO.

> but the data isn't useful without the context of experience

The experience is the data in Reinforcement Learning.

> and all processing power can do it overfit model without experience.

That is wrong, the agents perform what is called exploration to avoid getting stuck in simple strategies.

> Even if we put AI into an army of robots running around and experiencing things, there are still scaling limits to encoding and communicating knowledge and understanding.

True, but machines scale better because they speak the same language, or they can learn to tune their language to get their message across.

> Human organizations are a great example of the scaling limits of intelligence.

Human organization is a testament to how far we can get with something as limiting as the commonly used language. The language that we use to communicate is subject to misinterpretation due to our subjective experiences, this limitation is not shared by machines.

1 comments

If the universe is a game you are playing, then yes playing that game is "experience", but for an AI to engage with reality it has to have experience in reality, not a game. The ability to play go very well doesn't enable an AGI to better understand reality.

> The experience is the data in Reinforcement Learning.

This is very true, and the critical problem. Data about how reality responds to an AI's actions is very sparse right now.

AIs do have a potential advantage in communications efficiency, but at some level of scale compression will happen, locally "irrelevant" data will be discarded and simplified approximations replace it. None of this will change the "big O" of the scalability of intelligence, just the constant factors. There is no exponential kickoff point.

What is the difference between experiencing reality and a game?

The difference I can see is that there is no one explicit objective function, but this doesn't stop generally capable agents [1], and doesn't imply that inverse RL is not possible.

> The ability to play go very well doesn't enable an AGI to better understand reality.

I disagree, model based RL constructs a model of the agent's reality and can use it to plan ahead, train the agent, or do some form of monte-carlo tree search. The latter is something very similar to how we imagine and think about the future.

[1] https://deepmind.com/blog/article/generally-capable-agents-e...

I'm in total agreement about the potential of growing AGI out of these methods, but there will be bottlenecks well before the gods of the singularity come knocking.
> What is the difference between experiencing reality and a game?

Finding out the consequences of an action is a lot more expensive in reality than a simulation of a game.

There is nothing fundamentally different between an infinite horizon game and reality.