Hacker News new | ask | show | jobs
by jbert 6048 days ago
> but the 95% historical data made writes slow because of the large number of indexes.

Why does updating an index take more time if there are many table rows?

i.e. if you have 5 indices on a table and 1 write/second, what's the difference in the work done by the db whether there are 1000 rows or 100k rows?

3 comments

We were dealing with terabytes of data and 10's of billions of writes per month. Most tables had 4 to 6 indexes, resulting in multiple updates for each write. If your data sets and indexes are huge there is little locality of data. Each row returned may have several disk hits. If you can keep your table sizes small enough that the indexes can remain in memory, performance is mush better.

Even worse were the autogenerated queries that joined a dozen tables together. I saw some that were 1500 lines long.

Simple explanation: Most database indexes are forms of B-trees. Insertion into a binary tree isn't constant time O(1), it's usually O(log n).
Thanks, I was thinking in terms of hashes (which doesn't really make sense for searching/ordering).
An index scales in proportion with the data it is indexing. The large size might produce more I/O and thus less than constant algorithmic complexity.