Not the author, but ChessBase sell a product (Megabase) which includes 85,000 annotated games in a more-or-less machine readable format. [0]
To me it's probably OK to train a model on this, at least for hobby purposes, though some GitHub Copilot critics might disagree. And a large part of ChessBase's business model is based on ripping off other people's IP and presenting it as their own [1]. But still, I can see why the author might want to be coy about answering this question.
Seconded. I looked around the writeup site for a bit and couldn't figure that out. That's arguably the most important piece of info about this project.
To me it's probably OK to train a model on this, at least for hobby purposes, though some GitHub Copilot critics might disagree. And a large part of ChessBase's business model is based on ripping off other people's IP and presenting it as their own [1]. But still, I can see why the author might want to be coy about answering this question.
[0] https://en.chessbase.com/post/new-mega-database-2021
[1] https://lichess.org/blog/YCvy7xMAACIA8007/fat-fritz-2-is-a-r...