|
|
|
|
|
by jamii
2783 days ago
|
|
My experience has been that the vast majority of papers on data-structures are at best misleading, and at worst deliberately biased. For example: > The hash table used by the authors of ART in their study was a chained hash table, but this kind of hash tables can be suboptimal in terms of space and performance due to their potentially high use of pointers. > Our experiments strongly indicate that neither ART nor Judy are competitive to the aforementioned hashing schemes in terms of performance, and, in the case of ART, sometimes not even in terms of space. https://www.victoralvarez.net/papers/A%20Comparison%20of%20A... |
|
This paper probably applies best when you use the HT or tree as a DB index data-structure without constant updates for huuuge datasets, probably not as well when used as an updated associative map in your average program. I'd guess hash-tables (HT) are still faster in day-to-day programmer use but not as much as pointed out here.
Caveats:
- HTs pre-allocated: In the two experiments that show those detailed bar-charts, the hash-tables are completely pre-allocated, so no re-allocation and re-hashing on inserts.
- Charts: The workloads IV-C and IV-E get by far the most attention (3/4 page, 9 bar charts - each), these best support the paper's statements that HTs are much faster and consume less mem than ART. The mixed-use case IV-D is reported with only a single small chart, the whole write-up is just about 1/4 page combined and ART does better (comparable, but still slower than fastest HT) - arguably IV-D is probably most realistic use-case for how most people use hashmaps day-to-day.
- The keys are 64-bit ints so aren't all that long, again maybe common in a DB index but not really what you use hashmaps for mostly in the real world. So ART can't play out some of its strengths.
- The data set skews large, larger than the original paper (16M to 1,000,000,000 (1Billion) keys), and most people probably foremost look at the charts on the right with the huge numbers that make HT look best. Again, makes the results not super applicable to day-to-day hashmap use imo.
- They only look up existing keys, probably not great for either ART nor HT but could hurt their CH*Bucket performance.
Most of these caveats are explained in the test, but some are kind of buried. You can't just look at the charts and conclude hash-tables are 5x as fast as ART, always.