|
|
|
|
|
by colanderman
3188 days ago
|
|
Keep in mind that the log n factor of a B+tree is generally very low; B+tree branching factors are typically in the 100s. Also, the first few levels are generally kept in cache, so you'll only have to hit disk for inner nodes past 100 million entries or so. Finally, hash indexes always require that the found row be confirmed in the data table, even for simple existence queries, since the keys themselves aren't stored in the hash table. (This is why hash indexes can't be UNIQUE.) B+trees can often answer such queries without the extra lookup (an "index-only scan"). If your B+tree is so large that its inner nodes spill onto disk (necessitating a 2nd disk seek), chances are the equivalent hash index will as well, which, combined with the consult of the data table, kind of negates the benefit. |
|
A hash index can be a big win for nested loop joins, especially at high concurrency. It is quite common to build a hash table over a subset of the inner table of an equijoin. This is (a) slow to construct and (b) memory-intensive (especially if many of these queries are run concurrently).
With a hash index, a lot of cases that required building a hash table to speed up a query can just use the hash index directly. Furthermore, every concurrent instance of the query can use the same hash index. This is a big win for both performance of a single query (latency) and query scalability.