Hacker News new | ask | show | jobs
by manigandham 2777 days ago
All modern columnstores can handle vast ingest rates and query speeds. It's all down to sharding, zone maps and sparse indexing, fast algorithms that operate on compressed data, and storage throughput. These are well-solved problems at this point.

Your blog post doesn't mention a single columnstore database though. KDB+, Clickhouse, MemSQL, or any of the GPU-powered variations will happily beat any TSDB out there.

2 comments

They can't handle high cardinality. Imagine having millions of columns in the column-oriented database (70% of those columns are updated every second). Imagine that you have to add new columns all the time.

The main misconception about TSDB's is that it's just a data with timestamp. TSDB's has multi-dimentional data model, time is only one of the dimensions.

You don't need to add new columns.

   CREATE TABLE metrics (metric_name text, ts timestamp, properties json, key(metric_name, ts))
OLAP queries with SQL are very good at handling whatever dimensions you want.
'metric_name text' is actually a tag-value list. Many TSDB's allows you to match data by tag. Each tag should be represented by a column in your example.

Single table design will be prone to high read/write amplification due to data alignment. Usually, you need to read many series at once so your query will turn into full table scan. Or it will read a lot of unneeded data which happened to be located near the data you need. Writes will be slow since your key starts with metrics name. Imagine that you have 1M series and each series gets new data point every second. In your scema it will result in 1M random writes.

Cardinality of the table will go through the roof, BTW. Every data point will add the key. Good luck dealing with this.

> 'metric_name text' is actually a tag-value list. Many TSDB's allows you to match data by tag. Each tag should be represented by a column in your example.

For the life of me, I can't figure out why this would be a good idea. I feel like I must not understand what you're saying:

If I've got a million disks that I want to draw usage graphs for, why I would put each one in a separate column?

What's the business use-case you're imagining?

> Usually, you need to read many series at once so your query will turn into full table scan.

Why do I need a full table scan if I'm going to draw some graphs?

I've got something like 4000 pixels across my screen; I could supersample by 100x and still be pulling down less data than the average nodejs/webpack app.

> Imagine that you have 1M series and each series gets new data point every second. In your scema it will result in 1M random writes.

No that's definitely not what manigandham is suggesting. One million disks each reporting their usage means a million rows in two columns (disk name/sym, and volume) would be written (relatively) linearly.

Modern TSDB is expected to support tags. This means that every series will have a unique set of tag-value pairs associated with it. E.g. 'host=Foo OS=CentOS arch=amd64 ssdmodel=Intel545 ...'. And in the query you could pinpoint relevant series by this tags thus the tags should be searchable. For instance, I may want to see how specific SSD model performs on specific OS or specific app. If the set of tags is stored as a json in one field such queries wouldn't work efficiently.

About that 1M writes thing. You have two options. 1) Organize data by metric name first, or 2) by timestamp. In case of 2) the updates will be linear but reads will have huge amplification. In case of 1) updates will be random, but reads will be fast.

You are talking about regular relational databases. I'm talking about distributed column-oriented databases. Big difference.

You can store tags and other data in JSON/ARRAY columns. The primary key is used for automatically sharding and sorting.

Groups of rows are sorted, split into columns, compressed, and stored as partitions with metadata. This means you can 'scan' the entire table in milliseconds using metadata and then only open the partitions, and the columns inside, that you actually need for your query. There are no random writes either, it's all constant sequential I/O with optional background optimization. And because of compression, storing the same key millions of times has no real overhead.

As stated several times before, we deal with this everyday on trillion row tables inserting 100s of billions of rows daily. Queries run in seconds. We do just fine.

We have a lot of data stored in Postgres JSON fields at my work. Around three months ago, we were trying to optimize some queries by adding sub-key indexing to the JSON field. We tried multiple times, but Postgres seemed to keep using sequential scan on the records, rather than the JSON index. So, we just decided to normalize the data and use proper foreign key fields for query performance.
Sure they can handle them, just not in an economically viable fashion.
Compared to what? Economically viable is very vague and relative. Columnar storage can easily reach 90% compression levels, is faster to read, and vectorized processing beats per-row/record iteration, so there's a reason it's the best for OLAP currently.

Why not benchmark IronDB against Clickhouse and post the results?

90% compression on time series is not viable unless you have some very specific dataset.
Both of you commenters have your own TSDBs which seems to be coloring all of your posts.

I'm going to leave this conversation as unproductive unless you care to benchmark your products against modern column-stores, although I think it's telling that there are never such benchmarks available.