Hacker News new | ask | show | jobs
by dwrensha 1586 days ago
Over on http://golf.horse/ there are leaderboards for finding the smallest Javascript programs that output various word lists, including the Wordle list. I've found it to be a fun and educational challenge. I would be excited to see more submissions!
2 comments

Interesting that page suggests that the wordle dictionary is 'at most 11.11 bits per word', which works out to 18015 bytes, slightly higher than was achieved here.

Golf.horse is measuring the payload plus the compressor, which I don't believe the author is doing, and is important when trying to be objective about the relative strength of solutions. Otherwise you can store the entire file out of band in the compressor, emit 1 bit in the output file, and then the compressor just returns the expected value on 1 and throws an error on 0 saying the file was corrupt.

The other major difference is that golf.horse requires the payload to be valid uft8 javascript, which means your binary blobs are essentially limited to base64 encoded strings. You might manage slightly better, but I suspect the limit is somewhere around 6.5bits per byte.

On the gameboy, you have the advantage of being able to use the full 8 bits per byte.

The possible number of x-byte-long valid UTF-8 strings is defined with the following recurrence relation:

    f(-x) = 0
    f(0) = 1
    f(x) = 0x80 * f(x-1) + 0x780 * f(x-2) + 0xf400 * f(x-3) + 0x100000 * f(x-4)
(Replace 0x80 with 0x7c to account for ES6 template literals.) The characteristic polynomial for this recurrence has a positive root of 144.61 (or 141.12 for literals). This means that you can actually put quite more than 7 bits per byte in a valid JS code, provided that your decoder is negligibly small enough. Indeed, there exists an encoding that allows exactly 7 bits per byte by using two-byte-long UTF-8 sequence as an escape code [1].

[1] http://blog.kevinalbs.com/base122

Huh. Javascript is significantly more accepting of non-printing characters in it's strings than I was expecting. I guess I should have known better.
There's definitely apocryphal stories of someone winning a compression contest with this very approach :)
Hasegawa Sayuri wrote up some notes about their submissions at http://sayuri.tx0.org/golfhorse/, including an extremely elegant and compact encoding of huffman trees.