Hacker News new | ask | show | jobs
by petergeoghegan 1456 days ago
> You don't actually address the point.

You said that B-Trees "use in indexing has dwindled with time". This is demonstrably false.

> Back then I used them ubiquitously but I honestly don't remember the last time I've seen one in a new design.

Even if that was true (which it definitely isn't), why would anybody judge the commercial or scientific relevance of B-Trees by looking at what new systems do? There are very few new systems that are intended to be competitive as general purpose systems, which is where most of the market is.

You still haven't actually named a single example of a "modern database kernel" that exemplifies what you're talking about.

2 comments

You are understating the limitations of B+trees for real workloads. A common and growing problem is the lack of online indexing that scales, the particularly data model doesn't matter that much. Index construction throughput and scaling has been a serious problem at some pretty boring companies I've done work for.

Use of B+trees in new database kernels has definitely diminished. I'm not counting the installed base of SQLite etc. Ubiquity doesn't make something the pinnacle of technology -- just as often it means "legacy installed base". I still use PostgreSQL a lot and mod it when I need to but I am not under any illusions about its limitations.

A "modern" database kernel that can efficiently use modern hardware is going to be a thread-per-core architecture with all I/O and execution scheduling done in user space, and the ability to operate on modern storage densities found on database servers, which can exceed a petabyte of direct-attached storage. The implications of storage density and its interaction with indexing drive most of the real changes in the way database kernels are designed. You can find elements of this in open source, but mostly in big data platforms rather than proper database engines.

That said, no one builds new high-end databases for retail anymore, the economics don't make sense. All the money moved to more specialized implementations that cater to smaller audiences where you don't need to advertise. The kernels are fully general, and widely reused, but the interfaces and surrounding bits are purpose-built for particular workloads. Hell, my old storage engines are still used under license by that lot. The days of database billboards on the 101 are an anachronism.

You keep talking about how B-Trees are rarely used but I’ve seen relatively new systems deployed that use them (or some minor variation). FoundationDB, FASTER, and a few others.

Other than in-memory hash indexing as used by SAP HANA, I’m not aware of any other data structures anywhere near as popular for database engines.

Can you name the data structure(s) that have superseded these?

> You are understating the limitations of B+trees for real workloads.

I never said anything about workloads. All I said was that your statements about B+Trees having dwindling usage are clearly false.

If you make a claim that is self-evidently bogus, then you shouldn't expect anything else that you may have said at the same time to be taken seriously.

Can you provide an explicitly named "modern" database that doesn't use B+tree indexes, and what specifically does it use instead?
" Like many modern analytical engines [18, 20], Procella does not use the conventional BTree style secondary indexes, opting instead for light weight secondary structures such as zone maps, bitmaps, bloom filters, partition and sort keys [1]. The metadata server serves this information during query planning time. These secondary structures are collected partly from the file headers during file registration, by the registration server, and partly lazily at query evaluation time by the data server. Schemas, table to file mapping, stats, zone maps and other metadata are mostly stored in the metadata store (in Bigtable [11] and Spanner [12])."

https://storage.googleapis.com/pub-tools-public-publication-...

> modern analytical engines

What about relational databases? Most are best suited for OLTP workloads.

Read this guy's past posts, you will save yourself a lot of time. He does a lot of this sort of thing.
...says the person who is confidently oblivious to the problems of mixing virtual methods and direct paged memory.
Thanks for the tip