Hacker News new | ask | show | jobs
by chaotic-good 3333 days ago
This is a one more example of the design in which one file holds many series and everything is chunked by time:

- "there is no longer a single file per series but instead a handful of files holds chunks for many of them"

- "We partition our horizontal dimension, i.e. the time space, into non-overlapping blocks. Each block acts as a fully independent database containing all time series data for its time window."

I don't believe this will work out well because it will introduce read amplification during query time (compared to file per series approach that they're using now). And I'm really curious how they managed to get 20M writes per second on laptop. The article states that they're using compression algorithm from Gorilla paper and Gorilla paper authors claims that they managed to get 1.5M on a single machine.

2 comments

It seems very much like the B+ tree approach is just a mental model put on top of the exact same idea that is being argued against. The initial list of "bad things about LSM approaches" has almost exactly the same items on it as the list of features the B+ approach claims to achieve.

Maybe I'm getting this all wrong, but aren't the leaves also representing chunked data, which is compressed.

The Prometheus solution also sequentially places compressed chunks for the same series. The time slicing actually has a lot of benefits and can simply be seen as the first level of the described B+ tree. An index of chunks for a series can then be seen as the second level.

The potential read amplification here seems completely equivalent. Just from my high-level view, all properties of the read and write path seem almost identical.

>> Maybe I'm getting this all wrong, but aren't the leaves also representing chunked data, which is compressed.

Leaf nodes contain data from one series (this data should be read together) and SSTable with time-series data contains many series and there is no guarantee that all these series will be used by the query.

>> The Prometheus solution also sequentially places compressed chunks for the same series.

I'm not really that familiar with Prometheus internals, especially with indexing part. As I understand it doesn't align writes so there is a lot of write amplification on the lower level that translates to cell degradation and non-optimal performance, but I can be wrong here.

> I don't believe this will work out well because it will introduce read amplification during query time (compared to file per series approach that they're using now).

It'll end up about the same in practice, only the time series data that needs to be read is read.

Query performance is looking quite a bit better with this design.

> And I'm really curious how they managed to get 20M writes per second on laptop.

I understand that was a micro-benchmark of one part of the system. The whole system is looking to be roughly in line with the Gorilla numbers.

> I understand that was a micro-benchmark of one part of the system. The whole system is looking to be roughly in line with the Gorilla numbers.

This makes sense now. I've found out that the compression algorithm performance numbers affect the overall performance in a big way. On modern SSD the entire workload is CPU bound.