Hacker News new | ask | show | jobs
by veselin 2355 days ago
GPT-2 is byte pair encoding and transformer. Is there any indication that BPE plays any role here, because the vocabulary is fixed? If not, then it is only the transformer that is interesting and this post is just trying to use the name of the model, because it sounds cool. And actually giving moves directly to transformer may improve the results.
1 comments

It's unknown what role if any BPE plays. I was surprised to discover that the final probability of a move is equal to the probability of each token from the root prompt, i.e. even though "1.Nf3 e5" is encoded as ['1', '.', 'N', 'f', '3', ' e', '5'] the probability of e5 seems unaffected by the fact that Nf3 is 3 tokens as opposed to one.

You're right that coming up with a token mapping could help things. It's a bit tricky to do that right now. Your options for fitting a custom vocab seems to be "use sentencepiece to fit a vocab, then modify the gpt-2 codebase to use the sentencepiece library for decoding".

I am honestly not sure if the output of sentencepiece is compatible with traditional encoders. What I mean is, it doesn't seem to generate an encoder.json + vocab.bpe file. It seemed to be some other kind of format. So I'm not sure if the tooling that has evolved around OpenAI's encoder format would be applicable there. I really don't know, though.

According to this slatestarcodex comment, someone got superior results on solely algebraic notation (which looks like g1f3 instead of Nf3): https://www.reddit.com/r/slatestarcodex/comments/el87vo/a_ve...

Another extension that might help is to periodically inject the full FEN board state. This was the format we were going to try next, which injects the full FEN after every move: https://gist.github.com/shawwn/318606c112774ad070f94de9c8288...

I'm so happy to get to work with GPT-2 1.5B. It's been a lot of fun to train.

By the way, if you like this kind of thing, you'll love Elo World. https://www.youtube.com/watch?v=DpXy041BIlA