| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dane-pgp 3165 days ago
	Surely if they are really starting with "zero", then all the AI is given is the arrangement of stones on the board (which starts empty) with the opportunity to select a position for its next stone after its opponent has placed one, until the game is over. (Let's assume that there is another piece of software responsible for determining when the game has finished, and who has won). As such, the only "rules" the AI needs are that it can only place one stone at a time, only in an empty position, and only when it is not the opponent's turn. To start with "less than zero", though, it would be interesting to see them give the AI a 3D simulation of a room with a simulated Go board and a simulated stone, and give the AI a fixed amount of time for it to have its turn. Just by using the pixel data from a simulated camera, it could learn to use a simulated arm to place the simulated stone on the board in a legal position. The reward function would just have to say, at the end of each allotted time period, whether a legal move had been made or not, and the AI could bootstrap up from that.