Hacker News new | ask | show | jobs
by jiggawatts 884 days ago
I love bit-bashing to squeeze something down to the smallest representation possible. There was a related discussion here on HN recently, where I came up with this:

A good representation needs to keep the full game history due to the no-repeat rule, and also because this is generally useful. There are only 32 pieces so a 5-bit number can select one uniquely. Each piece has a maximum of about 32 positions it can move to on its turn, for another 5 bits. Then the encoding is just 10 bits per ply. Typical games are 40 moves (80 ply) and hence require just 800 bits or 100 bytes for the whole game history. Some additional data is required to track promotions, but that's it.

Then you could get clever with Huffman coding or the like, since some moves are more common than others. E.g.: on the first turn, only the pawns or the knights can move. For quite a few turns, pieces either can't move or are very restricted in where they can move to. Pawns always have so few moves available that their next move could be represented with just 2 bits instead of the full 5. Etc...

In practice, it ought to be possible to get the typical standard game representation down to 8 bits per move or less, especially for short games.

To represent games starting from arbitrary configurations, there needs to also be included 32 6-bit numbers to encode where each of the unique pieces is on the 64-square board. This adds just 24 bytes for this initial "setup".

3 comments

You could generate a list of legal moves with a specific algorithm, and use the necessary number of bits to indicate which one happened.
Just encoding algebraic chess notation gets you to 9 bits without much skill: 3 bits for the 6 pieces, 3 bits for rank and 3 bits for file, with the edge cases shoved in the extra encoding space given that you have 3 bits to encode 6 pieces.

On average, chess has a branching factor of 30-40, so you need a touch more than 5 bits on average even with clever Huffman coding. Just Huffman coding regular algebraic chess notation for movement would probably get you down to 6 bits on average.

Of course, the other issue is that the tighter you compress the data, the harder it is to actually manipulate that data, and it's not clear that the space saving for denser encodings is worth it.

That’s a lot of extra wasted bits! There are only 32 pieces in total, and only one of 16 can move each turn. So a mere 4 bits is sufficient to select the moved piece.

The assumption here is that this is an encoding “at rest” and isn’t used directly, except perhaps for equality comparisons. The decoder would have to track the game state and expand this into a much larger format for programmatic manipulation.

>To represent games starting from arbitrary configurations, there needs to also be included 32 6-bit numbers to encode where each of the unique pieces is

In that case, you can either omit pieces not in play or omit pieces in their standard starting position to shave off a few more bits.