| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Cybiote 3467 days ago

There is also a difference between learning and playing. During play, the human operated at ~20 watts on computation while the machine ran at a rate of anywhere from 26,000 watts to 260,000 watts, depending on how efficient the TPUs are (and assuming 10x as the ideal case). The human is also learning new things about Go as it plays, planning complex muscle firing programs, filtering audio and managing attention, working on subconscious goals, running complex vision tasks, all while running its autonomic subsystem.

Low power is also still important due to issues of heat and energy availability. Low power also implies high efficiency which is important for several reasons.

The human brain is estimated at 20 watts (when people talk about computing systems they tend to not include the power needed for all the auxiliary infrastructure needed to keep it networked and cooled); it's also estimated that beyond 4 hours a day, learning effectiveness drops precipitously.

If we take the case of Go, you can take a 4 year old human and have a professional player by 13. This is about 950 megajoules spent by the brain while learning Go. For the machine, if you look at the learning part (self play, value and policy on 50 GPUs for several weeks) the estimate on energy spend is about 30,000 megajoules. The policy network is itself ~20,000 MJ, while the full AlphaGo system playing on a single GPU and 48 CPUs is just a strong amateur.

But this is not even an apples to apples comparison since the brain is not spending all of its energy on learning Go. In fact, learning how to play Go is very far from the most difficult thing the brain is learning how to do.

2 comments

lern_too_spel 3467 days ago

You're confusing professional level with world champion level. How many megajoules will it take to create a world champion Go player using the human brain? It would take multiple brains, each teaching each other. We can now train a professional-level Go player pretty cheaply — Zen Go plays at a professional level and runs on commodity hardware.

link

Cybiote 3466 days ago

I did not actually. I pointed out the approx energy required to get to 1 dan professional then pointed out that a system trained with orders of magnitude more energy was still far less capable. To get to Lee Sedol is still < 3000 MJ (30 years of daily practice and study) which is still an order of magnitude less energy than training a single amateur level policy network.

To say AlphaGo or any RL system is learning from self-play is not in the typical understanding of the phrase. It's more akin to evolving with competitions against previous versions of itself, which should count as different instances. As stated on page 38 of 1604.00289.pdf

Between the publication of Silver et al. (2016) and before facing world champion Lee Sedol, AlphaGo was iteratively retrained several times in this way; the basic system always learned from 30 million games, but it played against successively stronger versions of itself, effectively learning from 100 million or more games altogether (Silver, 2016).

In comparison, from that same paper, it was estimated that Sedol could not have played much more than 50,000 games. My own estimate is about 40,000 games.

As for work required to learn, it's irrelevant to point out that one can learn from others. Learning, whether from play, books or study still requires energy spend and work by the learner. Most of the extra work is from study, occasional review with a tutor and discussions with peers--the last more of a meta-step: learning to learn. Accounting for books and some time with tutors will not, I argue, shift the budget much. Especially if you include that any machine playing Go requires overhead of power infrastructure, energy, cooling, networking equipment and occasional maintenance staff. And learning, improvement in architecture, requires searching through and discarding many changes and playing through a cumulative hundreds of millions of games.

The fact that humans have other available highly efficient means of learning is a boon and not a downfall. That's the whole point of getting to AGI. Learning from books and others is akin to learning by Program Synthesis from specifications.

link

lern_too_spel 3466 days ago

As I said in my previous post, you ignored that a machine gets to 1 dan professional far more cheaply.

The reason I noted the requirement of other trained professionals for training a human is that those other humans can distill what they have learned over years of play into simple rules. The machine can also use those rules, but the particular machine you are comparing to was specifically trained without any such rules and was required to synthesize them from scratch from historical games and self-play.

link

argonaut 3467 days ago

No, that is a ridiculous comparison. Then you should start counting the energy required to construct the Google server farm, the energy of all the computers used by all the engineers who built the farm while they were in university, on and on and on.

link

lern_too_spel 3466 days ago

Now you're suggesting calculating the energy used by the human champion's ancestors, which I am not suggesting.

The computer can train itself with just records of past games and self-play. The world champion level human cannot. You must account for the difference in training.

link

argonaut 3465 days ago

No, the computer can't. The computer was itself trained/programmed by humans.

link

lern_too_spel 3464 days ago

It was programmed by humans who didn't program in any rules of thumb or patterns for playing Go. The computer synthesized all it needed to understand about how to win at the game from historical games and self-play. The human champion synthesized only some patterns himself and learned most of them from other professional human players.

link

louprado 3467 days ago

TPU defined: https://en.wikipedia.org/wiki/Tensor_processing_unit

link