Hacker News new | ask | show | jobs
by jandrewrogers 1459 days ago
You are understating the limitations of B+trees for real workloads. A common and growing problem is the lack of online indexing that scales, the particularly data model doesn't matter that much. Index construction throughput and scaling has been a serious problem at some pretty boring companies I've done work for.

Use of B+trees in new database kernels has definitely diminished. I'm not counting the installed base of SQLite etc. Ubiquity doesn't make something the pinnacle of technology -- just as often it means "legacy installed base". I still use PostgreSQL a lot and mod it when I need to but I am not under any illusions about its limitations.

A "modern" database kernel that can efficiently use modern hardware is going to be a thread-per-core architecture with all I/O and execution scheduling done in user space, and the ability to operate on modern storage densities found on database servers, which can exceed a petabyte of direct-attached storage. The implications of storage density and its interaction with indexing drive most of the real changes in the way database kernels are designed. You can find elements of this in open source, but mostly in big data platforms rather than proper database engines.

That said, no one builds new high-end databases for retail anymore, the economics don't make sense. All the money moved to more specialized implementations that cater to smaller audiences where you don't need to advertise. The kernels are fully general, and widely reused, but the interfaces and surrounding bits are purpose-built for particular workloads. Hell, my old storage engines are still used under license by that lot. The days of database billboards on the 101 are an anachronism.

3 comments

You keep talking about how B-Trees are rarely used but I’ve seen relatively new systems deployed that use them (or some minor variation). FoundationDB, FASTER, and a few others.

Other than in-memory hash indexing as used by SAP HANA, I’m not aware of any other data structures anywhere near as popular for database engines.

Can you name the data structure(s) that have superseded these?

> You are understating the limitations of B+trees for real workloads.

I never said anything about workloads. All I said was that your statements about B+Trees having dwindling usage are clearly false.

If you make a claim that is self-evidently bogus, then you shouldn't expect anything else that you may have said at the same time to be taken seriously.

Can you provide an explicitly named "modern" database that doesn't use B+tree indexes, and what specifically does it use instead?
" Like many modern analytical engines [18, 20], Procella does not use the conventional BTree style secondary indexes, opting instead for light weight secondary structures such as zone maps, bitmaps, bloom filters, partition and sort keys [1]. The metadata server serves this information during query planning time. These secondary structures are collected partly from the file headers during file registration, by the registration server, and partly lazily at query evaluation time by the data server. Schemas, table to file mapping, stats, zone maps and other metadata are mostly stored in the metadata store (in Bigtable [11] and Spanner [12])."

https://storage.googleapis.com/pub-tools-public-publication-...

> modern analytical engines

What about relational databases? Most are best suited for OLTP workloads.