|
|
|
|
|
by veselin
2355 days ago
|
|
GPT-2 is byte pair encoding and transformer. Is there any indication that BPE plays any role here, because the vocabulary is fixed? If not, then it is only the transformer that is interesting and this post is just trying to use the name of the model, because it sounds cool. And actually giving moves directly to transformer may improve the results. |
|
You're right that coming up with a token mapping could help things. It's a bit tricky to do that right now. Your options for fitting a custom vocab seems to be "use sentencepiece to fit a vocab, then modify the gpt-2 codebase to use the sentencepiece library for decoding".
I am honestly not sure if the output of sentencepiece is compatible with traditional encoders. What I mean is, it doesn't seem to generate an encoder.json + vocab.bpe file. It seemed to be some other kind of format. So I'm not sure if the tooling that has evolved around OpenAI's encoder format would be applicable there. I really don't know, though.
According to this slatestarcodex comment, someone got superior results on solely algebraic notation (which looks like g1f3 instead of Nf3): https://www.reddit.com/r/slatestarcodex/comments/el87vo/a_ve...
Another extension that might help is to periodically inject the full FEN board state. This was the format we were going to try next, which injects the full FEN after every move: https://gist.github.com/shawwn/318606c112774ad070f94de9c8288...
I'm so happy to get to work with GPT-2 1.5B. It's been a lot of fun to train.
By the way, if you like this kind of thing, you'll love Elo World. https://www.youtube.com/watch?v=DpXy041BIlA