|
|
|
|
|
by polyrand
1385 days ago
|
|
Regarding hash joins, the SQLite documentation mentions the absence of real hash tables [0] SQLite constructs a transient index instead of a hash table in this instance
because it already has a robust and high performance B-Tree implementation at
hand, whereas a hash-table would need to be added. Adding a separate hash table
implementation to handle this one case would increase the size of the library
(which is designed for use on low-memory embedded devices) for minimal
performance gain.
It's already linked in the paper, but here's the link to the code used in the paper [1]The paper mentions implementing Bloom filters for analytical queries an explains how they're used. I wonder if this is related to the query planner enhancements that landed on SQLite 3.38.0 [2] Use a Bloom filter to speed up large analytic queries.
[0]: https://www.sqlite.org/optoverview.html#hash_joins[1]: https://github.com/UWHustle/sqlite-past-present-future [2]: https://www.sqlite.org/releaselog/3_38_0.html |
|
As we were writing the paper, we did consider implementing hash joins in SQLite. However, we ultimately went with the Bloom filter methods because they resulted in large performance gains for minimal added complexity (2 virtual instructions, a simple data structure, and a small change to the query planner). Hash joins may indeed provide some additional performance gains, but the question (as noted above) is whether they are worth the added complexity.