Hacker News new | ask | show | jobs
by barrkel 2923 days ago
Databases are amongst the most complex systems you will ever use as a developer. At the limit, they rival operating systems for complexity - distributed concurrent systems with lots of low-level memory and filesystem action, along with a parser, optimizing compiler and often a code generator too.
2 comments

The performance characteristics of them, though, are easy to learn, and easier to get a grasp on via EXPLAIN and such.
Given a database and a query, yes, you can understand the execution profile.

What you can't do is start out with a schema and a query and understand the execution profile. Query planners take statistics into account when determining which indexes to use or to fall back to scanning, and can normally only examine a fraction of possible join orders. So you can usually only fully predict the performance profile of simple queries. Databases can run perfectly well one day and fall off a performance cliff the next, when statistics change; doesn't happen often, but it can happen.

More likely, something works perfectly well in test and in the early days, but rapidly becomes untenable as the data grows and superlinear slowdowns emerge.

Thus you need to put run-time monitoring into place, and have a routine process of examining slow-running queries, and fixing things that are slow: rewriting queries, adding indexes, possibly denormalizing data, materializing views, etc. It's an ongoing process in any application that works with lots of data.

I might add, those processes have little to do with SQL. When you have a lot of data, and a lot of queries, then you're going to have to monitor and optimize your databases.

I might also add that a basic understanding of data modelling and what an index can and can't do is sufficient to avoid many, many performance pitfalls (this again has mostly nothing to do with SQL per se). Any undergrad course on databases teaches these basics.

The variance provided by an enormous pile of state and an algorithm which uses that state means it's a bit more unpredictable than most other systems, where you're used to seeing lines, curves and almost always monotonic series.
While that may be true for some of them, a lot of databases are far simpler.

Sqlite for example is conceptually very simple, and it's fairly cleanly layered, so you can aim to understand the parser, the compiler and the bytecode execution engine separately.

It's also exceptionally well documented. Look at the "Technical and Design Documentation" section here [1].

It won't tell you everything about how a more complex database like e.g. Postgres works, but it will give a very good overview of most of what is relevant for a database user that wants a better understanding of why a database does what it does with a given query.

[1] https://sqlite.org/docs.html