the program fails repeatedly and then loads a previous state to try a different move, what you see is the result of multiple trials. it can't reliably get it right on the first try.
Neither can a baseball player. He's been practicing since age 3 to hit that ball.
Is a decade or two of baseball practice really all that different from a computer playing and re-playing the game over and over looking for optimal strategy?
Is a decade or two of baseball practice really all that different from a computer playing and re-playing the game over and over looking for optimal strategy?