|
|
|
|
|
by avifreedman
3578 days ago
|
|
Great comparison, and I hope the state of the world keeps getting better for TSDBs so we don't need to build our own at some point - but I disagree re: -------------------------------------
Performing queries across billions of metrics looking for labels that only match a few of them (a common scenario with time series data at scale) is really slow in Cassandra. This is because of the way it stores data in columns. This extends to any columnar database including Google's BigQuery which all have a natural disadvantage with time series data.
------------------------------------- There's nothing inherently limiting in columnar databases that makes it slow to match only a few elements that match only a few out of billions or trillions of records. ... but a classic columnar store might not be as efficient for storage, or might take 5-10x the nodes to return with the same speed with that kind of filtering, depending on storage and clustering mechanisms used. |
|
When somebody wants to query for a few points matching certain dimensions in Cassandra there's no getting around the fact that you have to do a scan across potentially billions of data points.
Whereas if the index lives outside in something relational like Postgres the lookup becomes insanely cheap and you're not having to scan over a bunch of data.
There are quite a few databases that don't have an efficient external index. For those, running 10 times the number of nodes would certainly speed things up, but it's probably just a good idea to avoid databases like that if you want fast queries.