Hacker News new | ask | show | jobs
by blattimwind 2923 days ago
You're not scared, you're just too lazy to learn the tools of your trade.

Databases are not very complex and use pretty much only textbook data structures and algorithms. Understanding how they process a given query and how a query will probably perform/scale (even without EXPLAIN ANALYZE) is not hard to learn. You do need to learn it (at some point; you don't for small data, which is most). But it's far from difficult.

> That's why I tend to avoid systematically using a SQL engine unless the data schema is very very simple, and manage and filter the data case by case in code.

And that's the mentality that gives us webshops were applying a simple filter results in a couple seconds load time and uses hundreds of MB of RAM per request, server side.

2 comments

Databases are amongst the most complex systems you will ever use as a developer. At the limit, they rival operating systems for complexity - distributed concurrent systems with lots of low-level memory and filesystem action, along with a parser, optimizing compiler and often a code generator too.
The performance characteristics of them, though, are easy to learn, and easier to get a grasp on via EXPLAIN and such.
Given a database and a query, yes, you can understand the execution profile.

What you can't do is start out with a schema and a query and understand the execution profile. Query planners take statistics into account when determining which indexes to use or to fall back to scanning, and can normally only examine a fraction of possible join orders. So you can usually only fully predict the performance profile of simple queries. Databases can run perfectly well one day and fall off a performance cliff the next, when statistics change; doesn't happen often, but it can happen.

More likely, something works perfectly well in test and in the early days, but rapidly becomes untenable as the data grows and superlinear slowdowns emerge.

Thus you need to put run-time monitoring into place, and have a routine process of examining slow-running queries, and fixing things that are slow: rewriting queries, adding indexes, possibly denormalizing data, materializing views, etc. It's an ongoing process in any application that works with lots of data.

I might add, those processes have little to do with SQL. When you have a lot of data, and a lot of queries, then you're going to have to monitor and optimize your databases.

I might also add that a basic understanding of data modelling and what an index can and can't do is sufficient to avoid many, many performance pitfalls (this again has mostly nothing to do with SQL per se). Any undergrad course on databases teaches these basics.

The variance provided by an enormous pile of state and an algorithm which uses that state means it's a bit more unpredictable than most other systems, where you're used to seeing lines, curves and almost always monotonic series.
While that may be true for some of them, a lot of databases are far simpler.

Sqlite for example is conceptually very simple, and it's fairly cleanly layered, so you can aim to understand the parser, the compiler and the bytecode execution engine separately.

It's also exceptionally well documented. Look at the "Technical and Design Documentation" section here [1].

It won't tell you everything about how a more complex database like e.g. Postgres works, but it will give a very good overview of most of what is relevant for a database user that wants a better understanding of why a database does what it does with a given query.

[1] https://sqlite.org/docs.html

>Databases are not very complex and use pretty much only textbook data structures and algorithms

skeptical expression

Whether they use highly tuned or proprietary or obscure algorithms (they do) is less important than the fact that understanding b-trees and basic normal forms will get you 90% of the way to understanding how to use one. It's just not that hard as a database user.
If you are used to imperative programming it does take a bit of time before you get to thinking is sets. Looking at the code I have inherited, even some reasonably experienced developers don't get to that stage.

Do you really think that replacing an "ORDER BY" statement with your own sort is going to be simpler?

"________ is not very complex and uses pretty much only textbook data structures and algorithms" is pretty much a false statement for any engineered product.
And the difference is usually close to nil for understanding them. (Also see squirrelicus' comment). Whether my database uses a B-tree or a proprietary, patented, insanely complex and finely tuned data structure that does essentially the same thing as a B-tree just a little bit faster is, for all intents and purposes, not a relevant distinction.

Now it's true that most databases have a ton of features and even more optional features that can (and will) interact in interesting ways, but I thought it was fairly obvious that most applications use very few or none of these. They are the 90/10-sort of features; 10 % of a database vendor's customers need 90 % of the specialized features in a database, and every single one of them uses a different handful.

Obviously you don't need to understand all these specialized features to use a database; you only need to grasp the handful if any at all you actually need at a time. Applications striving for wide database compatibility tend to rarely use any of these, simply because they don't exist in all databases, or work differently, or have divergent interfaces.

So any time you have an application that runs on MySQL or postgres in production but is developed and tested on SQLite (an antipattern itself, but I digress), you can be assured that you'll only see fairly basic DDL and SQL.

(You also seem to be intermingling understanding and building. I can use and understand how a typewriter works without having a clue how to build one. Yes, there are lots of hard problems solved by databases, but how they do it is mostly a don't care. I don't have to care how SQLite does power-fail-safe transaction, it does and what that means for me, is all I have to know as a user.)