| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rosser 4011 days ago

Your scenario is valid for "index organized tables", where the data for a given row is stored in one of the index's leaf pages. Synthesizing an index on-demand for those involves an extraordinary amount of random IO.

In postgres, a table's data is stored in the "heap" (a disk file representing the table, itself; indexes are separate disk files). Consequently, you can do a sequential pass through the table, and then calculate/build the index in RAM (assuming you have sufficient "maintenance_work_mem", in postgres configuration terms, otherwise you'll spill to disk at this step, too), and then write it out sequentially as well — and only then walk the index depth-first to get to the leaf page that points to the disk page in the heap that contains the row you're selecting.

(And even that random depth-first traversal of the index will probably actually happen in RAM, because postgres significantly defers to the kernel's buffer cache to mitigate IO costs, and the index you're traversing is pretty much guaranteed to be cached at this point.)

EDIT: It would apply to MySQL using InnoDB, however, as IIRC that does store table data on the leave pages of the primary key index. Can someone corroborate that?