| > 'metric_name text' is actually a tag-value list. Many TSDB's allows you to match data by tag. Each tag should be represented by a column in your example. For the life of me, I can't figure out why this would be a good idea. I feel like I must not understand what you're saying: If I've got a million disks that I want to draw usage graphs for, why I would put each one in a separate column? What's the business use-case you're imagining? > Usually, you need to read many series at once so your query will turn into full table scan. Why do I need a full table scan if I'm going to draw some graphs? I've got something like 4000 pixels across my screen; I could supersample by 100x and still be pulling down less data than the average nodejs/webpack app. > Imagine that you have 1M series and each series gets new data point every second. In your scema it will result in 1M random writes. No that's definitely not what manigandham is suggesting. One million disks each reporting their usage means a million rows in two columns (disk name/sym, and volume) would be written (relatively) linearly. |
About that 1M writes thing. You have two options. 1) Organize data by metric name first, or 2) by timestamp. In case of 2) the updates will be linear but reads will have huge amplification. In case of 1) updates will be random, but reads will be fast.