|
|
|
|
|
by gfody
915 days ago
|
|
> ..under the hood it’s still mostly just doing joins on b-trees! I could see the on-disk format needing to be simple and stable, but once the datas buffered who knows what structures and algorithms these proprietary engines are using? You would need to have done some reverse engineering or had hands-on details from the inside which presumably comes w/legal consequences for leaking them. |
|
Generally the secret sauce in these things is the query optimiser heuristics.
The actual data structures and algorithm are often relatively simple.
Having said that, I’ve read their whitepaper on how they implement hash tables, and… it’s way more complex than I had assumed.
They cater for scenarios like many duplicated keys, parallel construction, unbalanced load across CPU cores, etc…