|
|
|
|
|
by chaotic-good
3333 days ago
|
|
This is a one more example of the design in which one file holds many series and everything is chunked by time: - "there is no longer a single file per series but instead a handful of files holds chunks for many of them" - "We partition our horizontal dimension, i.e. the time space, into non-overlapping blocks. Each block acts as a fully independent database containing all time series data for its time window." I don't believe this will work out well because it will introduce read amplification during query time (compared to file per series approach that they're using now).
And I'm really curious how they managed to get 20M writes per second on laptop. The article states that they're using compression algorithm from Gorilla paper and Gorilla paper authors claims that they managed to get 1.5M on a single machine. |
|
Maybe I'm getting this all wrong, but aren't the leaves also representing chunked data, which is compressed.
The Prometheus solution also sequentially places compressed chunks for the same series. The time slicing actually has a lot of benefits and can simply be seen as the first level of the described B+ tree. An index of chunks for a series can then be seen as the second level.
The potential read amplification here seems completely equivalent. Just from my high-level view, all properties of the read and write path seem almost identical.