Hacker News new | ask | show | jobs
by mcpherrinm 666 days ago
Prometheus itself is pretty simple, fairly robust, but doesn’t necessarily scale for long-term storage as well. Things like VictoriaMetrics, Mimir, and Thanos tend to be a bit more scalable for longer term storage of metrics.

For a few hundred gigs of metrics, I’ve been fine with Prometheus and some ZFS-send backups.

1 comments

Just to expand upon some experiences with some of the listed software.

The architecture is quite different between Thanos and the others you've listed as unlike the others, Thanos queries fan out to remote Prometheus instances for hot data and then ship out data (typically older than 2 hours) via a sidecar to s3 storage. As the routing of the query depends on setting Prometheus external labels, our developer queries would often fan out unnecessarily to multiple prometheus instances. This is because our developers often search for metrics via a service name or some service related label rather than use an external label which describes the location of the workload which is used by Thanos.

Upon identifying this, I migrated to Mimir and we saw immediate drops in query response times for developer queries which now don't have to wait for the slowest promethues instances before displaying the data.

We've also since adopted OpenTelemetry in our workloads and directly ingest otlp in to Mimir (Which VictoriaMetrics also support).