| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nlnn 1456 days ago

I've found it's not just scale, but also down to query patterns across the data being stored.

I'm with you on using an RDBMS for almost everything, but worked on quite a few projects where alternatives were needed.

One involved a lot of analytics queries (aggregations, filters, grouping etc.) on ~100-200GB of data. No matter what we tried, we couldn't get enough performance from Postgres (column-based DBs / Parquet alternatives gave us 100x speedups for many queries).

Another was for storing ~100M rows of data in a table with ~70 columns or so of largely text based data. Workload was predominantly random reads of subsets of 1M rows and ~20 columns at a time. Performance was also very poor in Postgres/MySQL. We ended up using a key/value store, heavily compressing everything before storing, and got a 30x speedup compared to using an RDBMS using a far smaller instance host size.

I wouldn't call either of them massive scale, more just data with very specific query needs.

2 comments

giovannibonetti 1456 days ago

> Another was for storing ~100M rows of data in a table with ~70 columns or so of largely text based data. Workload was predominantly random reads of subsets of 1M rows and ~20 columns at a time.

Kimball's dimensional modelling helps a lot in cases like this, since probably there is a lot of repeated data in these columns.

link

snarfy 1456 days ago

It's pretty old problem as they are competing ideas. It's OLTP vs OLAP. Postgres is designed for OLTP.

link

nlnn 1454 days ago

Yeah, I think some of the problems are when both or needed on the same data, or when the use case changes over time.

E.g. our customers are stored in Postgres, so let's also log their actions there linked to the user table.

5 years on someone decides we need to run analytical queries across years of 200M logged actions, joined with other data in the DB.

So now we either have to live with horrible performance, migrate the logs to something suitable for OLAP (and lose all the benefits of a solid RDBMS), or have some syncing/export process to duplicate somewhere suitable for querying.

link