Nice idea. But in this case, i need to verify, that this is the same robot, because robot can do nothing, or play like a noob in first 9 games, but last game will be played by user through UI.
And you can't verify this, because there is different "worlds" and robot can be not so good, for solve some game problem in some random world.
> To reduce the subjectivity of scoring in major meets, panels of five or seven judges are assembled. If five judges then the highest and lowest scores are discarded and the middle three are summed ...
Also, make them play 100 games, and drop 10 of them.
And you can't verify this, because there is different "worlds" and robot can be not so good, for solve some game problem in some random world.