Hacker News new | ask | show | jobs
by CJefferson 2731 days ago
I strongly suspect alphazero is easily beatable, once you have your hands on it. This is just from experience that most neural network style systems are weak against adversarial opponents who understand their internals.

Of course I can't be sure, because Google refuses to give out anyone access to alphazero, or a network trained with it. Personally, that gives me more confidence they know there are significant exploitable weaknesses.

3 comments

No need to wait for AlphaZero, you can try Leela Chess Zero today. From my experience the network without search has some blind spots, but the tree search is pretty effective in fixing them.
Adversarial? If the model exclusively trains against itself, you can’t really insert anything there. Do you mean, play confusing moves at the beginning of the game?
I mean, if we had the network, it would be easy to beat, the same way you can confuse image recognition systems with very minor changes.
The way general adversarial networks work on tricking image recognition systems is that they vary pixels of the input image slightly to manipulate the output of the neural network.

For alphazero, the input is the board, which you can't manipulate arbitrarily. You can run an evaluation of a board based on a move and see if its significantly different than the evaluation that alphazero comes up with, and maybe try to exploit that. But if you have a better evaluation of some state than that of alphazero, you're likely a stronger player anyway so this extra step is unnecessary. Most of the value of the bot comes from the evaluation function of a board, along with some hyper-parameters. But the evaluation is probably the most important part and the most difficult to replicate.

That doesn't follow. For you to confuse it, you need to change the inputs. For images, this is fine, we can smoothly change lots of little things. For chess games or go you don't have that freedom.

You can download the weights for LCZero right now though and try out your theory. https://github.com/LeelaChessZero/lc0/wiki/Getting-Started

You are right, I should try. I'll see if I can find time in the new year.

I'd prefer to try with a go player, because as you say, in chess it's hard to exactly control the input to the network, it's easier in Go.

Here's a go setup https://github.com/gcp/leela-zero

There's current best weights available. Not alphazero, but I would expect that issues would be general and so if there are issues with leela zero they may transfer and if you don't see issues with leela zero they're unlikely to exist in alpha zero (at least, if they do they may be very particular to subtle training differences).

Would be very interested to see what you find if you get the chance.

You can change the inputs: it depends on when (ply) and which move you play. Some moves are uncommon enough to make it possible for you to uncover something?
You absolutely can change the inputs, but the point I wanted to make is that unlike images where you can make a human-irrelevant changes you can't really do that with chess or go.

If you want to construct a particular position on the board, you'd likely need to use multiple steps, require the AI to play very particular moves and then the outcome would be a certain move from the AI. Even then, a simple incorrect classification doesn't help all that much, you need your opponent to make repeated mistakes.

I think in reality if you uncovered a type of move it wasn't expecting you are likely to uncover a new strategy in general rather than a trick. Image classification however lets you play uninterrupted with tiny pixel value changes, and you only need a single incorrect output to "win".

It's suspect it's a bit harder for the network to be overfit like this, but it's probably possible it has some gaps in its evaluation. However, those gaps would have to persist beyond its search horizon and not concretely affect material or mobility and it just seems vanishingly unlikely you'll find any systematic way to exploit anything.
I guess if you understand the internal of a NN you can just write a paper to publish it.

Generating the right noise was proven to be successful against NNs (https://blog.openai.com/adversarial-example-research/) but I am not sure how could you apply that to this context.