Hacker News new | ask | show | jobs
by Someone 1576 days ago
I feared that (“This list is a lot shorter, so there will be fewer opportunities for savings”)

However, I think you can layout the tree so that no pointers point backwards. If so, can you make those offsets smaller by making them relative to the current point in the tree?

Also, since the list only has five-letter words, for the last letter, you don’t even need the letters themselves, just 26 bits for what letters can complete a word. That might be a saving.

Also, the crab source code is available (DEC: http://www.gtoal.com/wordgames/gatekeeper/crab.sh.txt, Mac: http://www.gtoal.com/wordgames/jacobson+appel/mac/Crab_sourc.... Both via http://www.gtoal.com/wordgames/scrabble.html)

Both are nice examples of C the way it is intended to be written, or rather, was intended to be written decades ago.

I don’t remember how that stores the data, but it might do a trick you didn’t think of.

And finally, I just realize that, for fairness, you need to look at (data size + decompressor size). Did you do that?

1 comments

Ah, I replied to myself with more information while you were also replying.

I also surmise that the short length of the words makes a DAWG just very heavy.

It's not clear to me that relative offsets would be notably smaller to the extent that would be needed. Even a hypothetical and cheated DAWG I came up with is ~33% bigger than alternatives. I've generally explored enough (see the paper in my other comment) that I don't feel that further investigations into a DAWG are likely to outperform other methods.

I can't see anything immediately that jumps out that the Crab game is doing that's special to save space, I think it just achieves better compression because you can compress larger files easier, and the words are longer with more overlapping sections.

I agree that you need to compare including the decompressor size, so I'm not sure which approach is better the Huffman trie or the one in the original article. I'm not familiar enough with GB programming to be able to suggest how much program memory would be needed to decode the Huffman Trie, it looks like it would be somewhat similar in complexity.

Thanks for all the informational replies.