Hacker News new | ask | show | jobs
by glinscott 3062 days ago
We are using data from both human grandmaster games and self-play games of a recent Stockfish version. Both have resulted in networks that play reasonable openings, but we had some issues with the value head not understanding good positions. We think we have a line on why this is happening (too few weights in the final stage of the network), but this is exactly the purpose of the supervised learning debugging phase :).