Hacker News new | ask | show | jobs
by valyala 1380 days ago
Both VictoriaMetrics and Grafana Mimir perfectly fit for long-term storage for Prometheus data. The difference is in the used data storage types - VictoriaMetrics stores data to persistent disks (aka block storage), while Grafana Mimir stores data to S3-like object storage. Both storage types - block storage and object storage - can be used for long-term storage. They have the following differences in the context of major cloud providers (AWS, GCP, Azure):

- Object storage space usually costs 2x-8x less than block storage space.

- Object storage has up to 100x highest latency for data access than block storage (hundreds of milliseconds for object storage vs milliseconds for block storage).

- Block storage usually has much lower network-related error rate comparing to object storagr. For example, it is quite common practice to retry reading data from object storage on network errors, while block storage-based filesystems are much more reliable for this aspect in major cloud providers.

- Cloud providers tend to charge every read operation for object storage, while reading from block storage is free. This point is usually overlooked when estimating costs for block storage vs object storage.

Given these differences, block storage usually provides better performance than object storage. Block storage also can cost less than object storage when the stored data is read frequently.

VictoriaMetrics is optimized for HDD-based block storage, so there is no need to use more expensive SSD-based block storage in most cases. Additionally, VictoriaMetrics compresses production metrics 2x-10x better than Prometheus-like solutions, which store data to object storage (Thanos, Cortex, Grafana Mimir). This also reduces long-term storage costs.

On top of this, enterprise version of VictoriaMetrics can be configured to downsample historical data, so it will take less disk space [1].

[1] https://docs.victoriametrics.com/#downsampling

1 comments

To be fair, the benefits of "object store" it is scalable and bottomless while you have to play with EBS volume expansion etc. Some folks find managing fleet of EBS volumes not a big deal others find it problematic.

I think having "long term storage" on S3 compatible location is a way to go but you need ability to use local storage as cache to queries on recent data or just date range you're working with can be fast.

Agreed with this. That's why we at VictoriaMetrics are investigating a hybrid storage scheme - to store recently added data at block storage, while gradually moving older data from block storage to object storage in background. On the query side, the requested data should be transparently queried from both object storage and block storage.