Hacker News new | ask | show | jobs
by corruptio 1579 days ago
oo oo, idea... trying to implement it now:

With 5 bits per letter, you have 6 symbols left over. We can use those to represent alternate pairs like "A or E", so you can encode BANDS and BENDS at the same time. Looks like if you pick the 6 highest frequency replacements for each starting letter, you can reduce the full word list size by ~2k words.

A naive lookup table for the replacements is 26 * 2 * 6 = 312 bytes.

edit: oops double counted the reduction

2 comments

Hmmm... sounds familiar https://justinlloyd.li/blog/word-search-game-part-two/

Many times you don't even need to store the individual letters, just the pairings, and if you are permitted to prune out troublesome words from your dictionary, all the better.

Since delta encoding is applied after this step, it's probably better to just use 26^5 instead of trying to pack extra things into those bits.
oook did some experiments... just counting the size of the delta streams: 17346B for 26^4 and 16852B for 32^4 (as described above)

interestingly, the sweet spot is a mix at 30^4 at 16797B

I've been using the cleaned up list, so it's possible that changes the behavior a little bit.

But more likely one of us has a bug in their logic.