| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tptacek 6417 days ago
	You use a hash table when you just need lookup. You use a tree when you need sorted order. The only tree worth using is a red-black tree. There, now you don't need to look it up. ;)

2 comments

wheels 6417 days ago

If I asked that question in an interview, and that's the answer you gave, I'd probably not hire you (assuming the position was CS-intensive).

There are a number of special-purpose trees which are important in some domains (tries, B-tree, splay tree, ...)

Simple hash implementations also tend to assume that it's cheap to compute a hash on the given key type (not always true) and that you have a rough idea on the bound of items in storage. Hashes far below that level are often also less memory efficient since the hash-table is pre-allocated.

link

tptacek 6417 days ago

Oh, bring it on, wheels.

First, take tries and B-trees out of consideration: they aren't generic search trees. Tries have a higher overhead than binary search trees. You use a trie when you need prefix search. You're wanking if you break out a trie otherwise. B-trees are for building external indices. Most people here have never written one.

If you don't agree with either of those, I'll still just say you're cheating by taking what I clearly meant as "binary search tree" and overgeneralizing. But I'll stand by my take on tries and b-trees. Tries are neat; I even wrote a pretty popular blog post about them:

   http://www.matasano.com/log/1009/aguri-coolest-data-structure-youve-never-heard-of/

... but I wouldn't try to make std::map work in terms of them.

Second, if you need to cache search results, you should cache search results explicitly, with a cache structure. Splay trees suck. They aren't balanced. They have worst case O(n). And they change on read; lookup can invalidate iterators. Lots of people write splay tree code. My theory as to why: splay trees are easy to write.

Finally, and most importantly: nobody in the history of the universe has written and put into production a tree ADT library (at least, not in C/C++) that didn't have catastrophic bugs. A balanced binary tree is something you want to have written once and tested and debugged, and then never touch again. If you're going to get balanced trees right once, what's it going to be? AVL? Actual 2-3 trees? The industry already sorted this out, which is why all the STLs use red-black.

(For the record: I agree with you on hashing being overrated and binary trees being underused, but I'm still more likely to use a generic hash table than a tree for simple lookup).

link

wheels 6416 days ago

Sure, now there's a real answer. :-)

What I was mostly taking offense to was: "The only tree worth using is a red-black tree." There are a number of special trees that are useful in specific domains. Especially in a comment addressed at folks that weren't familiar with tree structures, this seemed important to note.

Other than that there's nothing in there I'd disagree with. On splay trees -- they're not just simple to implement, they're also lightweight. Often useful for the cases where you only half-way care. Tries and B-trees obviously aren't general purpose.

link

swombat 6416 days ago

I wouldn't dare show my face in a CS-intensive interview and not know this. Fortunately, that's not my bread and butter at the moment. :-)

link

swombat 6417 days ago

Cheers :-)

link