| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hodgesrm 1459 days ago

The article left out one of the most fundamantal topics of databases--clustering of data in storage is everything. Examples:

1. If you store data in rows it's quite fast to insert/update/delete individual rows. Moreover, it's easy to do it concurrently. However reads can be very slow because you read the entire table if you scan a single column. That's why OLAP databases use column storage.

2. If you sort insert data in the table, reading ranges based on the sort key(s) is very fast. On the other hand inserts may spray data over over the entire table, (eventually) forcing writes to all blocks, which is very slow. That's why many OLTP databases use heap (unsorted) row organization.

In small databases you don't notice the differences, but they become dominant as volume increases. I believe this fact alone explains a lot of the proliferation of DBMS types as enterprise datasest have grown larger.

Edit: minor clarification