| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rosser 4178 days ago

Index-organized tables (can) work very well if your access pattern is only ever by index (whether that's the pkey or "clustered" index, or a supplementary index that points to the leaf node of the clustered index on which the desired row(s) reside).

As soon as you find yourself doing a table (or "sequential") scan, however, you're incurring a tremendous amount of random IO, because you have to walk the index by breadth. Even if you're fully cached, or your db lives on SSD, that's a ridiculous amount of overhead.

The general rule of thumb, in my experience, puts the inflection point between whether it's better to use an index or do a table scan and then filter, is when you're >= 10% of the table. Less, the index will win; more, and the scan wins. It's surprising just how often you actually exceed that threshold in real world scenarios.

EDIT: See also the reply to your original comment, describing the need for additional row-adjacent (or supplementary heap) space to handle updates. Index-organized tables are far more sensitive to becoming unbalanced than indexes, and maintaining that additional space imposes a further IO penalty, because you still have to read those disk pages into memory, whether there's "live" data on them or not.