|
|
|
|
|
by cbutner
1718 days ago
|
|
The original hope was for this to be a third head on top of the AlphaZero model, but I couldn't think of a way to generate commentary during self-play (such that it would gradually improve), and trying to rotate supervised commentary training into the main schedule ended up hurting both sides because of the disjoint datasets. So, now the commentary decoder is just trained separately on the final primary model. The previous and current game positions are fed into the primary model, and the outputs are taken from the final convolutional layer, just before the value and policy heads. Then, that data plus the side to play is positionally encoded and fed into a transformer decoder. It would be better for a search tree/algorithm to be used for commentary too so that tactics could be better understood, but that would need some kind of subjective BLEU equivalent, and metrics like those don't work well for chess commentary. You can see a diagram of the architecture here: https://chrisbutner.github.io/ChessCoach/high-level-explanat... |
|
Actually, I can't figure out from your explanation why you trained the whole network yourself instead of just using Leela's network and training the commentary head on top?
If you wanted to in-cooperate the search, maybe you could just take the 1800 or so probabilities output by the MCTS and add some layers on top of that before concatenating with the other data fed into the transformer.
In either case, this is a fantastic project and perhaps an even more impressive write up! Congrats and thank you!