Hacker News new | ask | show | jobs
by branko_d 904 days ago
> A database with no indexes is slow.

No it’s not… if all you do is write to it. In fact, it’s the fastest possible database for such case.

Indexes are pure redundancy - they contain the data already in the base table which must be maintained during writes.

But they can make reads so much faster, if the data access pattern can utilize them. The key is to identify access patterns which justify the price of the index.

3 comments

In some cases they can make things worse, it's worth remembering that query optimizer looks not only on indexes, but also on statistics and estimated operation costs. If your statistics are out of date and your criteria are not specific enough (e.g. they match 80% rows), then an index is going to slow the query down. It needs to traverse the index to get the row IDs, fetch all the blocks containing them, read those, filter out irrelevant rows. It's probably going to be faster with a pure full table scan (due to linear reads).
If you only need to write and never read you don't need database.

> /dev/null will suffice.

And if you ever need to read anything even once database without indexes is slow

I can write to one database, replicate it (for example, by log shipping), and add an index only on the replica. This is not just being pedantic, this is a real-world pattern for some analytics solutions. You have a very high number of writes, and then build a reporting database at the end of every day, with all reads going to that database.
That is a very valid scenario. But when you do those things you already know costs and benefits of the indexes so you are not going to be harmed by "no indexes = database slow" heuristc.

"database = fast" is way worse heuristic to believe in for the people that need heuristics to move on with what they are doing.

> And if you ever need to read anything even once database without indexes is slow

This may be true in most cases, but not all. It just depends on your access pattern. If all you do are full table scans (possibly feeding to hash-joins), you won't benefit from indexes at all!

The whole point is that indexes do not auto-magically improve performance with no downsides. If they did, we could just index all columns and call it a day!

> If all you do are full table scans (possibly feeding to hash-joins), you won't benefit from indexes at all!

In most cases it means that you shouldn't be doing what youa re trying to do. Or best case, that maybe database is not the right tool for your job.

True, but it's still relevant in the early stages of iterating on a project. I'm familiar with one team that started investing in database-level optimization months before even beginning to deprecate their `loadAllRowsFromTheLargestTable` RPC.
To be pedantic: writes may also make use of indices, if you have constraints (like foreign keys) the db needs to check for every write.
True, but FK (in child table) must reference a key (in parent), and most databases won't let you create a key without the underlying index.

The other direction, however, is not a given: most DBs will let you create a FK on fields not covered by an index, so deleting or modifying a parent can benefit if you create such index explicitly, because it can check for the existence of children much faster (and avoid potentially locking the entire table). Again, the access pattern governs what indexes are needed: if you never delete/modify parent, you may not need an index on FK (unless you also have some queries which can use it, of course).