Hacker News new | ask | show | jobs
by crazygringo 47 days ago
Sure, but the whole point is that you often don't know anything further about the data.

That's why b-trees are the standard in databases. The data could be anything, and its characteristics could massively change at any time, as you suddenly import a whole bunch of new rows at once.

And while you can certainly design algorithms around e.g. gradient descent to try to accelerate lookup, b-trees are already incredibly fast, and have lots of other benefits like predictable worse-case performance and I/O requirements, supporting range scans, ordered traversal, prefix conditions, etc.

So yes, you can certainly design lookup algorithms that are more efficient for particular data distributions, but they will also often lack other important properties. And b-trees are already so fast, improvements are often negligible -- like even if another algorithm produces a closer initial guess, it may be slower to locate the final item, or it may be faster on average but have horrible worst-case performance that makes it unusable.

Even with a paper dictionary, I've always used pretty much a binary search beyond the first initial guess, which only saves you a couple of hops. And actually, once I get to the right handful of pages I'm probably more linear than I should be, and I'd probably be faster if I tried to do a rigorous binary search, but I have to balance that with how long it takes to flip pages.

1 comments

Databases often use table statistics to try to do better at generating query plans. I wonder if they use them to make indexes faster as well?
The cost plan is a crude approximation of the actual query cost. Sometimes, the query planner makes a terrible guess. Your resident DBA won't appreciate being sometimes paged at 3 AM on a Sunday. A good strategy is to freeze the query plan once you have sufficient sample size of data in the involved tables.
Unfortunately, "freezing the query plan" isn't something available in many popular databases. It's not supported by e.g. Postgres, MySQL, or SQLite.
Perhaps not every aspect of the query plan can be dictated, but both MySQL and Postgres (with pg_hint_plan) allow you to specify hints that enforce specify join order and scan behavior for the tables in your query, which is where the majority of "unexpected change in query plan" problems will arise. As for SQLite, I'm less familiar with the knobs available for query tuning, but a cursory Google tells me that join order is respected when using CROSS JOIN, and index usage can be forced with INDEXED BY/NOT INDEXED.