Hacker News new | ask | show | jobs
by gowld 2300 days ago
Games are quite small -- under 50 turns, each under 2 bytes in binary, but more if you start with simple text languge before compression. You need to compress each game individually. Would database-wide compression work?
2 comments

You do not need to compress every game individually. Behind the scenes, your mainstream database is storing the data in pages, and these pages - containing many adjacent rows - can be compressed.

The compression level achieved at a page level is much higher than compressing rows individually because there is lots of repetition between rows. It also speeds up io and other good things. It is almost always good, which is why the most modern database storage engines do it and do it by default.

Consider a csv file, and compare it to the same data stored as json objects, one row per line. The uncompressed json file is going to be much bigger, as the columns are repeated in every line. But both files gzip to much the same size, because all those keys are repeated again and again and the two files have basically the same entropy.

On the other hand, compressing each line in the file individually would be a poor choice, giving relatively poor gains.

There were database engines that did row-level compression, but these performed poorly and I know of nobody who used eg innodb compression.

I see several levels of compression:

1) First, as you mentioned, the limited symbol table for expressing all the moves.

2) But not every move is possible; you just need a way to order all the possible moves and then say the nth of those moves.

3) Even better: not all of those moves are equally likely. You can define (n cycles of) a chess playing engine that ranks all moves and reserves shorter symbols for the best moves (as judged within that time frame). Players tend to select higher ranked moves, so that’s another regularity to exploit.

Of course, 3) comes at the cost of compression/extraction time.