Hacker News new | ask | show | jobs
by polyrand 1385 days ago
Regarding hash joins, the SQLite documentation mentions the absence of real hash tables [0]

  SQLite constructs a transient index instead of a hash table in this instance 
  because it already has a robust and high performance B-Tree implementation at 
  hand, whereas a hash-table would need to be added. Adding a separate hash table 
  implementation to handle this one case would increase the size of the library 
  (which is designed for use on low-memory embedded devices) for minimal 
  performance gain.
It's already linked in the paper, but here's the link to the code used in the paper [1]

The paper mentions implementing Bloom filters for analytical queries an explains how they're used. I wonder if this is related to the query planner enhancements that landed on SQLite 3.38.0 [2]

  Use a Bloom filter to speed up large analytic queries.

[0]: https://www.sqlite.org/optoverview.html#hash_joins

[1]: https://github.com/UWHustle/sqlite-past-present-future

[2]: https://www.sqlite.org/releaselog/3_38_0.html

1 comments

That's correct, the optimizations from this paper became available in SQLite version 3.38.0.

As we were writing the paper, we did consider implementing hash joins in SQLite. However, we ultimately went with the Bloom filter methods because they resulted in large performance gains for minimal added complexity (2 virtual instructions, a simple data structure, and a small change to the query planner). Hash joins may indeed provide some additional performance gains, but the question (as noted above) is whether they are worth the added complexity.