| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ltta_code 2783 days ago

Funny enough, even that paper seems biased as well to me.

This paper probably applies best when you use the HT or tree as a DB index data-structure without constant updates for huuuge datasets, probably not as well when used as an updated associative map in your average program. I'd guess hash-tables (HT) are still faster in day-to-day programmer use but not as much as pointed out here.

Caveats:

- HTs pre-allocated: In the two experiments that show those detailed bar-charts, the hash-tables are completely pre-allocated, so no re-allocation and re-hashing on inserts.

- Charts: The workloads IV-C and IV-E get by far the most attention (3/4 page, 9 bar charts - each), these best support the paper's statements that HTs are much faster and consume less mem than ART. The mixed-use case IV-D is reported with only a single small chart, the whole write-up is just about 1/4 page combined and ART does better (comparable, but still slower than fastest HT) - arguably IV-D is probably most realistic use-case for how most people use hashmaps day-to-day.

- The keys are 64-bit ints so aren't all that long, again maybe common in a DB index but not really what you use hashmaps for mostly in the real world. So ART can't play out some of its strengths.

- The data set skews large, larger than the original paper (16M to 1,000,000,000 (1Billion) keys), and most people probably foremost look at the charts on the right with the huge numbers that make HT look best. Again, makes the results not super applicable to day-to-day hashmap use imo.

- They only look up existing keys, probably not great for either ART nor HT but could hurt their CH*Bucket performance.

Most of these caveats are explained in the test, but some are kind of buried. You can't just look at the charts and conclude hash-tables are 5x as fast as ART, always.