|
|
|
|
|
by mattb314
3141 days ago
|
|
I'm a little confused about the columnar database comment: > Performing queries across billions of metrics looking for labels that only match a few of them (a common scenario with time series data at scale) is really slow in Cassandra. This is because of the way it stores data in columns. This extends to any columnar database including Google's BigQuery which all have a natural disadvantage with time series data. I've pretty much only heard "columnar database" used as opposed to row store database, and it seems like storing time series data in columns makes much more sense. Could someone clear up exactly how "labels" (which I probably don't understand) are so much harder for column stores to deal with? |
|
Storing labels in a row based system (like SQL) allows querying by value, not column name which takes advantage of all optimizations and indexes making it a lot faster.
That said there is nothing forbidding someone to do both, DalmatinerDB, for example, uses a column-based format for metric values but a row-based format (PostgreSQL) for dimensions.