Hacker News new | ask | show | jobs
by kilburn 878 days ago
You also need to account for:

- Whose turn is it

- If the king is in its initial position, whether the king has moved previously (that player can't castle anymore).

- For the pawns, you may need the previous position/move to allow en-passant capturing [1]

The standard specification used for this is the Forsyth–Edwards Notation (FEN). [2] It was invented way earlier than computers, so it was targeting humans / isn't optimized for digital compression, but it at least specifies _almost_ everything that you need to track.

The _almost_ is because to account for the repetition rule [2] you need to track all previous positions. At this point, the problem changes from "how to efficiently store the positions of the pieces in the board" to "how to efficiently store (partial) games", and you are probably better off listing the moves than the board state.

[1] https://en.wikipedia.org/wiki/En_passant

[2] https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notati...

[3] https://en.wikipedia.org/wiki/Threefold_repetition

2 comments

I love bit-bashing to squeeze something down to the smallest representation possible. There was a related discussion here on HN recently, where I came up with this:

A good representation needs to keep the full game history due to the no-repeat rule, and also because this is generally useful. There are only 32 pieces so a 5-bit number can select one uniquely. Each piece has a maximum of about 32 positions it can move to on its turn, for another 5 bits. Then the encoding is just 10 bits per ply. Typical games are 40 moves (80 ply) and hence require just 800 bits or 100 bytes for the whole game history. Some additional data is required to track promotions, but that's it.

Then you could get clever with Huffman coding or the like, since some moves are more common than others. E.g.: on the first turn, only the pawns or the knights can move. For quite a few turns, pieces either can't move or are very restricted in where they can move to. Pawns always have so few moves available that their next move could be represented with just 2 bits instead of the full 5. Etc...

In practice, it ought to be possible to get the typical standard game representation down to 8 bits per move or less, especially for short games.

To represent games starting from arbitrary configurations, there needs to also be included 32 6-bit numbers to encode where each of the unique pieces is on the 64-square board. This adds just 24 bytes for this initial "setup".

You could generate a list of legal moves with a specific algorithm, and use the necessary number of bits to indicate which one happened.
Just encoding algebraic chess notation gets you to 9 bits without much skill: 3 bits for the 6 pieces, 3 bits for rank and 3 bits for file, with the edge cases shoved in the extra encoding space given that you have 3 bits to encode 6 pieces.

On average, chess has a branching factor of 30-40, so you need a touch more than 5 bits on average even with clever Huffman coding. Just Huffman coding regular algebraic chess notation for movement would probably get you down to 6 bits on average.

Of course, the other issue is that the tighter you compress the data, the harder it is to actually manipulate that data, and it's not clear that the space saving for denser encodings is worth it.

That’s a lot of extra wasted bits! There are only 32 pieces in total, and only one of 16 can move each turn. So a mere 4 bits is sufficient to select the moved piece.

The assumption here is that this is an encoding “at rest” and isn’t used directly, except perhaps for equality comparisons. The decoder would have to track the game state and expand this into a much larger format for programmatic manipulation.

>To represent games starting from arbitrary configurations, there needs to also be included 32 6-bit numbers to encode where each of the unique pieces is

In that case, you can either omit pieces not in play or omit pieces in their standard starting position to shave off a few more bits.

I remembered current turn later but I could not think of your 2nd and 3rd point, you are right

I guess you can't just take a look at board to see current game state, I was too focused on what game looks like while there are additional off the board info

https://news.ycombinator.com/item?id=39066345

You could probably modify this solution, I am already using 3 bits for piece type

0-3 rook queen bishop knight

4 Non-moved king (can castle)

5 moved king

6 pawn that moved 2 squares last turn (can be captured using en passant)

7 other pawns

And then a single extra bit to store whose turn it is

instead of encoding "moved king", encode "moved rook" on each rook. Since a pawn can never be on its home rank, overlap rook (not moved) & pawn (en passant) codes. Now you have a code for "king (to move)" and "king (not to move)".

If you use a variable length encoding you can also use different static probabilities for the different ranks, since "rook (not moved)" and "pawn (en passant)" can only appear on certain ranks and only depending on the piece color (well, pawns can't appear on ranks 1 or 8 at all)

You’d need the same information for rooks to handle castling.