Hacker News new | ask | show | jobs
by SeveredCross 5622 days ago
I found the two-queue technique to be much more intuitive than the binary heap technique, which definitely resulted in it being more fun to code.

As an aside, the two-queue technique is also more performant, as you can build the tree in O (n) time instead of O (n log n) time.

1 comments

You still have to sort to build the queues. So you won't get below O(n log n) for the whole algorithm.

If you squint just about right, you can probably see the equivalence between sorting plus queues and the tree based techniques. Especially if you use a tree-structured sorting algorithm.

Indeed, though as the OP points out just below, if you have fixed frequencies, you don't have to sort.

That said, you generally have fixed frequencies with a predetermined set of constraints on the data, and you can determine the most common character frequency (or it's already known). I'm not sure how common this is in real-world applications of Huffman coding.

Seeing as you should only be building the tree once though, you're right--you'd be hard-pressed to tell the difference between the two approaches, barring implementation mistakes. Designing an implementation mistake that would cause noticeable differences is an exercise left to the reader.

In my particular case, the character frequencies are (more or less) fixed and the sorting could be done ahead-of-time before feeding the leaf nodes into the first queue. So I really did get O(n) at instantiation time. However, my n is so small that it didn't matter much.
You can also get fix your whole encoding then.
You can if it's absolutely fixed. But what about when it's only "more-or-less-fixed"? :-)
Then you need the sorting, again. You can't have your cake and eat it, too.
Being able to relegate the cost of sorting to be a one-time cost on a server and allowing clients to build the tree they need for decoding in O(n) allows the client to [something about cake].
Oops, horrible typo there.