|
If you just want to learn a model of chess games in algebraic notation (is that what it's called? I don't play chess) then you don't need to train a Transformer. That would be overkill, and you wouldn't really be able to train it very well. I mean, unless you have a few petaflops of compute lying around. You could instead start with a smaller model. A traditional model, like an n-gram model, a Hidden Markove Model (HMM) or a Probabilistic Context Free Grammer (PCFG). The advantage of such smaller model is that they don't need to have billions of parameters to get good results, and you'll get more bang for the buck of the many, many, many examples of games you can find. But, don't expect to get very far. A system that learns only to predict the best move will never beat a system that looks ahead a few dozen ply, with alpha-beta minimax, or that plays out entire game trees, like Monte-Carlo Tree Search. Well, unless you do something silly to severely hobble the search-based system, or train the predictive model with all possible chess games. Which I mean, is theoretically possible: you just need to build out an entire chess game tree :P You could also try with a simpler game: Tic-Tac-Toe should be amenable to a predictive modelling approach. So should be simpler checker-board games like hexapawn. Or even checkers, which is after all solved. But my question is, what would you hope to achieve with all this? What is the point of training a predictive model to play chess? Hasn't this been tried before, and shown to be no good compared to a search-based approach? If not, I'd be very surprised to find that out, and there might be some merit in trying to test the limits of the predictive approach. But it's going to be limited alright. |
Could you elaborate a bit more on why you think training a transformer only on chess moves(in algebraic notation, yes. Algebraic notation is the one that says <piece><square>, roughly speaking) wouldn't work? I'm not sure I understand.
As for your question, I don't really have a good answer. I've just been working on my own crazy chess AI ideas for a long while now and I was taken aback by the fact that GPT seems able to occasionally "find" long tactical sequences even in positions that have not occured before in known games. So it seemed only natural to try to think deeply about whether it represents some nugget of something useful, maybe even a fundamentally new approach. But I have serious doubts as I explained in GP.
It's also just been an interesting angle for me to understand what LLMs are doing because I'm deeply familiar with chess and methods of thinking about it both human and artifical. There's a lot more for me to grab onto than with any other application in demystifying it's behaviour.