|
|
|
|
|
by nlnn
1456 days ago
|
|
I've found it's not just scale, but also down to query patterns across the data being stored. I'm with you on using an RDBMS for almost everything, but worked on quite a few projects where alternatives were needed. One involved a lot of analytics queries (aggregations, filters, grouping etc.) on ~100-200GB of data. No matter what we tried, we couldn't get enough performance from Postgres (column-based DBs / Parquet alternatives gave us 100x speedups for many queries). Another was for storing ~100M rows of data in a table with ~70 columns or so of largely text based data. Workload was predominantly random reads of subsets of 1M rows and ~20 columns at a time. Performance was also very poor in Postgres/MySQL. We ended up using a key/value store, heavily compressing everything before storing, and got a 30x speedup compared to using an RDBMS using a far smaller instance host size. I wouldn't call either of them massive scale, more just data with very specific query needs. |
|
Kimball's dimensional modelling helps a lot in cases like this, since probably there is a lot of repeated data in these columns.