Hacker News new | ask | show | jobs
by YeGoblynQueenne 2173 days ago
>> In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search.

"Massive, deep search" that started from a book of opening moves and the combined expert knowledge of several chess Grandmasters. And that was an instance of the minimax algorithm with alpha-beta cutoff, i.e. a search algorithm specifically designed for two-player, deterministic games like chess. And with a hand-crafted evaluation function, whose parameters were filled-in by self-play. But still, an evaluation function; because the minimax algorithm requires one and blind search alone did not, could not, come up with minimax, or with the concept of an evaluation function in a million years. Essentially, human expertise about what matters in the game was baked-in to Deep Blue's design from the very beginning and permeated every aspect of its design.

Of course, ultimately, search was what allowed Deep Blue to beat Kasparov (3½–2½; Kasparov won two games and drew another). That, in the sense that the alpha-beta minimax algorithm itself is a search algorithm and it goes without saying that a longer, deeper, better search will inevitably eventually outperform whatever a human player is doing, which clearly is not search.

But, rather than an irrelevant "bitter" lesson about how big machines can perfom more computations than a human, a really useful lesson -and one that we haven't yet learned, as a field- is why humans can do so well without search. It is clear to anyone who has played any board game that humans can't search ahead more than a scant few ply, even for the simplest games. And yet, it took 30 years (counting from the Dartmouth workshop) for a computer chess player to beat an expert human player. And almost 60 to beat one in Go.

No, no. The biggest question in the field is not one that is answered by "a deeper search". The biggest question is "how can we do that without a search"?

Also see Rodney Brook's "better lesson" [2] addressing the other successes of big search discussed in the article.

_____________

[1] https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)#Des...

[2] https://rodneybrooks.com/a-better-lesson/

4 comments

>But, rather than an irrelevant "bitter" lesson about how big machines can perfom more computations than a human, a really useful lesson -and one that we haven't yet learned, as a field- is why humans can do so well without search

I think the answer is heuristics based on priors(e.g. board state), which we've demonstrated (with alphago and derivatives, especially alphago zero) that neural networks are readily able to learn.

This is why I get the impression that modern neural networks are quickly approaching humanlike reasoning - once you figure out how to

(1) encode (or train) heuristics and

(2) encode relationships between concepts in a manner which preserves a sort of topology (think for example of a graph where nodes represent generic ideas)

You're well on your way to artificial general reasoning - the only remaining question becomes one of hardware (compute, memory, and/or efficiency of architecture).

Are we certain that well-trained human players are not doing search? It's possible that a search subnetwork gets "compiled without debugger symbols" and the owner of the brain is simply unaware that it's happening.
>> Are we certain that well-trained human players are not doing search?

Yes- because human players can only search a tiny portion of a game tree and a minimax search of the same extent is not even sufficient to beat a dedicated human in tic-tac-to, leta lone chess. That is, unless one wishes to countenance the possibility of an "unconscious search" which of course might as well be "the grace of God" or any such hand-wavy non-explanation.

>> It's possible that a search subnetwork gets "compiled without debugger symbols" and the owner of the brain is simply unaware that it's happening.

Sorry, I don't understand what you mean.

Why do you dismiss the unconscious search that humans do in Go? Having learned Go some years ago it is such an exciting thing to realize that with practice the painstaking process of consciously evaluating the myriads possibilities of moves gives way to just "seeing" solutions out of nothing. You can really feel that your brain did wire itself up to do analysis for you at a level that is subconscious but interfaces so gracefully with your conscious cognition that it is a real marvel.
>> Why do you dismiss the unconscious search that humans do in Go?

The question is why you say that humans perform an unconscious search when they play Go. And what kind of search is it, other than unconscious? Could you describe it, e.g. in algorithmic notation? I mean, I'm sure you couldn't because if you could then the problem of teaching a computer to play Go as well as a human would have been solved years and years ago. But, if you can't describe what you're doing, then how do you know it's a "search"?

Note that in AI, when we talk of "search" (edit: at least, in the context of game-playing) we mean something very specific: an algorithm that examines the nodes of a tree and applies some criterion to label each examined node as a target node or not a target node. Humans are absolutely awful at executing such an algorithm with our minds for any but the most trivial of trees, at least compared to computers.

Here's a section of Michael Redmond's (9-dan professional Go player) commentary on the Lee Sedol vs AlphaGo matches: https://youtu.be/yCALyQRN3hw?t=3031

It's really fun to watch his commentary because he relentlessly plays "variations" — possible next moves and sequences — while waiting for the players, explaining the tradeoffs between moves and the consequences they lead to a few steps ahead in the game.

I don't know what to call "variations" but a tree search with heuristics. He does it slowly to explain it to the audience, but I have no doubt the same process runs much faster in his mind.

Fast enough to evaluate a few million future positions in a few seconds? Like I say in another comment, even professional players cannot "look ahead" more than a few ply, so whatever it is they're doing "in their heads", the tree search they're reporting is not how they win games.

To clarify, you can come up with an explanation of anything that you do, or observe yourself or another person do. For example, you might explain how you hit a ball with racket in tennis or with a bat in baseball, etc, but that doesn't mean that the process you are describing is the process that your mind (let alone your brain) actually follows.

If nothing else because such a description will necessarily fudge important steps. For example, if I describe myself walking as "I put one foot in front of the other" - have I explained enough about walking that it can now be reproduced mechanically? Experience teaches that -no.

I believe humans do look up tables. If pattern X, then deny possibilities Y and Z. And then you more or less consciously choose one of the remaining options.

If you can avoid making "obvious" (in retrospect) mistakes, you will be a competitive chess or go player.

And given that we have millions of neurons to suppress activity in one node if activity was high enough in another node, I would summarize the brain as effectively a huge chain of look up tables.

I'm not sure why YeGoblynQueenne thinks this is such a mystery. (This is not the first time I've been puzzled by their pessimism on HN.) There is no mystery here: AlphaZero shows that you can get superhuman performance by searching only a few ply by sufficiently good pattern recognition in a highly parameterized and well-trained value function, and MuZero makes this point even more emphatically by doing away with the formal search entirely in favor of an more abstract recurrent pondering. What more is there to say?
>> (This is not the first time I've been puzzled by their pessimism on HN.)

I don't understand why you keep making personal comments like that about me. I suspect you don't realise that they are unpleasant. Please let me make it clear: such personal comments are unpleasant. Could you please stop them? Thank you.

MuZero performs a "formal search". In many more ways than one, for example optimisation is still a search for an optimal search of parameters. But I guess you mean that it doesn't perform a tree search? Quoting from the abstract of the paper on arxiv [1]:

In this work we present the MuZero algorithm which, by combining _a tree-based search_ with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, withoutany knowledge of their underlying dynamics.

(My underlining)

If I remember correctly, MuZero is model-free in the sense that it learns its own evaluation function and reward policy etc (also going by the abstract). But it retains MCTS.

Indeed, it wouldn't really make sense to drop MCTS from the architecture of a system designed to play games. I mean, it would be really hard to justify discarding a component that is well known to work and work well, both from an engineering and a scientific point of view.

_________________

https://arxiv.org/abs/1911.08265

MuZero doesn't do a search over explicit board states, MCTS (where are the playouts at the leaf nodes? where is the simulator state?) or otherwise: it does search over internal-abstract-reward-predictive-states/action pairs, much like a human does, who thinks through possible actions using an internal representation sometimes backtracking as they decide a move is bad and evaluating a different one which feels better. It's search over a tree of possible actions, sure, but this is not MCTS, even if they loosely use the phrase in places (similarly, the tree search Zero does for gameplay is not MCTS, even if people sometimes describe it that way or conflate it with the training).
I'm a bit confused. MuZero doesn't do a "formal search" (as per your previous comment) but it does a tree search (as per your current comment). It doesn't perform MCTS, even though the paper itself states it performs MCTS.

I quote from the arxiv paper again:

Appendix B Search

We now describe the search algorithm used by MuZero. Our approach is based upon Monte-Carlo tree search with upper confidence bounds, an approach to planning that converges asymptotically to the optimal policy in single agent domains and to the minimax value function in zero sum games [22].

So I'm sorry but I really don't understand what you mean here.

Also, as discussed in other comments, it's dangerous to assume anyhing about how "a human does" anything to do with any kind of mental calculation. Whatever MuZero does and however good or bad it does it, there is nothing to tell us that it does it as a human does.

> Are we certain that well-trained human players are not doing search?

Some, but there's a LOT more context pruning the search space.

Watch some of the chess grandmasters play and miss obvious winning moves. Why? "Well, I didn't bother looking at that because <insert famous grandmaster> doesn't just hang a rook randomly."

At least in chess, if it is not the search, then it is probably the evaluation function.

Expert players have likely a very well-tuned evaluation function of how strong a board "feels". Some of it is explainable easily: center domination, diagonal bishop, connected pawn structure, rook supporting pawn from behind, others are more elaborate, come with experience and harder to verbalize.

When expert players play against computers, the limitation of their evaluation function becomes visible. Some board may feel strong, but you are missing some corner case that the minmax search observes and exploits.

I like to caution against taking concepts from computer science and AI and applying them directly to the way the human mind works. Unless we know that a player is applying a specific evaluation function (e.g. because they tell us, or because they vocalise their thought process etc) then even suggesting that "players have an evaluation function" is extrapolating far from what it is safe. For one thing- what does a "function" look like in the human mind?

Whatever human minds do, computing is only a very general metaphor for it and it's very risky to assume we understand anything about our mind just because we understand our computers.

> No, no. The biggest question in the field is not one that is answered by "a deeper search". The biggest question is "how can we do that without a search"?

My guess is that we're doing pattern recognision, where we recognize taht a current game state is similar to a situation that we've been in before (in some previous game), and recall the strategy we took and the outcomes it had lead to. With large enough body of experience, you can to remember lots of past attempted strategies for every kind of game state (of course, within some similarity distance).

This insight is the essence of the AlphaZero architecture. Whereas a pure Monte Carlo Tree Search (MCTS) starts each node in the search tree with a uniform distribution over actions, AlphaZero trains a neural network to observe the game state and output a distribution over actions. This distribution is optimized to be as similar as possible to the distribution obtained from running MCTS from that state in the past. It's very similar to the way humans play games.