Hacker News new | ask | show | jobs
by jiggawatts 903 days ago
On disk, SQL Server uses only b-trees, unless using the new ColumnStore format.

In memory during a query it can use temporary indexes of other types, primarily hash tables and bitmaps.

Its performance on ad-hoc complex queries is about as good as it gets, few if any other RDBMS can beat its performance, but under the hood it’s still mostly just doing joins on b-trees!

1 comments

> ..under the hood it’s still mostly just doing joins on b-trees!

I could see the on-disk format needing to be simple and stable, but once the datas buffered who knows what structures and algorithms these proprietary engines are using? You would need to have done some reverse engineering or had hands-on details from the inside which presumably comes w/legal consequences for leaking them.

It tells you what it uses when you inspect the “query plans”. There’s some fairly technical explanation in the docs of what each of the operators do.

Generally the secret sauce in these things is the query optimiser heuristics.

The actual data structures and algorithm are often relatively simple.

Having said that, I’ve read their whitepaper on how they implement hash tables, and… it’s way more complex than I had assumed.

They cater for scenarios like many duplicated keys, parallel construction, unbalanced load across CPU cores, etc…

the query plan says "key lookup" or "index seek" can we really assume anything about the implementation?