Hacker News new | ask | show | jobs
by code_biologist 913 days ago
A database with no indexes is slow. Finding a particular row will require a linear search.

The crux is understanding what data access patterns you will have and what indexes / data structures accelerate that access pattern. "Index = fast" is a painfully pernicious untrue meme. It's absolutely true for application tables with queries only touching a few rows. On the other hand, analytics queries touching a high proportion of rows with joins on equality conditions (ie. hash joinable) isn't going to go any faster with an index.

I've seen devs shotgun indexes at tables to fix performance (done it myself too) but the real test of index understanding is when that doesn't work.

4 comments

> On the other hand, analytics queries touching a high proportion of rows with joins on equality conditions (ie. hash joinable) isn't going to go any faster with an index.

That's when you bring a BRIN to the table. :-)

Or change the layout entirely to clustered columnstore from row store.

All databases are datastructures on disk and memory optimized for specific access and write patterns.

> "Index = fast" is a painfully pernicious untrue meme.

I believe that lack of internalization of that meme (regardless of how true it is) can be a cause of real trouble.

I was working in a team where Java devs simply didn't bother to put indexes on tables because they were small (like 100 rows or so). When I (JS dev) pestered them long enough to finally do it suddenly the whole app got super snappy and they were very thankful as it happened just as degrading performance was causing a lot of gloomy mood.

If a table has 100 rows and those rows aren't huuuuge, there is no reason to put in index. The rows probably fit into one page or so anyways and are all read together anyways.

Why are you mentioning that it were Java devs ad that you are a JS (javascript?) dev? Does that give you any kind of... expertise in the matter?

In theory, with pages, caching and all, yes. In practice it made collosal difference.

With correct indexes the queries were able to be found only with index lookup without touching data at all. It was many times faster than sequential scan of even this few rows.

I'm mentioning our respective roles to show that people with nominally no expertise make such decisions often in practice and such "memes", even tough they are not strictly true in all cases, may bring a lot of value in practical setting.

"No inedex = slow" is a good heuristic for nearly all devs.

> With correct indexes the queries were able to be found only with index lookup without touching data at all

That doesn't make things faster, unless the rows contain so much data, that they force many page reads. Otherwise, reading the whole index or reading all rows takes the same time.

I think there is some information that you are not providing that might make a difference - or it is actually a misunderstanding on your side and if the queries really got faster, it's because of something else, such as a join.

If it didn't matter noone would ever use binary search for the amount of data that fits single page. Linear search would always be preferred as it is simpler.

The only thing I'm not providing are the exact sizes of the tables involved other than that they were considered small. Some having 100 rows but I can't rule out that some were 1000. I didn't say each of those tables fit a single page which I agree could be significant. I am completely sure that the difference was caused by adding indexes to small tables, nothing else. The improvement was achieved in a single day by a sigle person adding indexes where there were none. He observed improvement as he went from table to table. I was specifically thanked by him for pestering him about it. He was informal tech lead for this project. The application was fairly large at that time and performance issues were everywhere. What was also used everywhere were those small tables containing stuff like list of countries, languages, currencies, types of operations and such.

> isn’t going to go any faster with an index

Depends on the data, and the index. A covering index would almost certainly be faster. But you said analytics, which implies large results from huge datasets, so it’s unlikely to fit into memory in the first place.

> The crux is understanding what data access patterns you will have and what indexes / data structures accelerate that access pattern

We have a winner. But when looking at SQL tables/Views/Stored Procedures, the data is also stored in order in memory, in effect have a master database and files on disk, with sorted databases and files in memory for faster access.