| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jbert 6095 days ago

> but the 95% historical data made writes slow because of the large number of indexes.

Why does updating an index take more time if there are many table rows?

i.e. if you have 5 indices on a table and 1 write/second, what's the difference in the work done by the db whether there are 1000 rows or 100k rows?

3 comments

russell 6095 days ago

We were dealing with terabytes of data and 10's of billions of writes per month. Most tables had 4 to 6 indexes, resulting in multiple updates for each write. If your data sets and indexes are huge there is little locality of data. Each row returned may have several disk hits. If you can keep your table sizes small enough that the indexes can remain in memory, performance is mush better.

Even worse were the autogenerated queries that joined a dozen tables together. I saw some that were 1500 lines long.

link

nixme 6095 days ago

Simple explanation: Most database indexes are forms of B-trees. Insertion into a binary tree isn't constant time O(1), it's usually O(log n).

link

jbert 6094 days ago

Thanks, I was thinking in terms of hashes (which doesn't really make sense for searching/ordering).

link

enqk 6095 days ago

An index scales in proportion with the data it is indexing. The large size might produce more I/O and thus less than constant algorithmic complexity.

link