| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by SuperQue 1545 days ago

One interesting question I have is regards to global availability.

With our current Thanos deployment, we can tie a single geo regional deployment together with a tiered query engine.

Basically like this:

"Global Query Layer" -> "Zone Cluster Query Layer" -> "Prom Sidecar / Thanos Store"

We can duplicate the "Global Query Layer" in multiple geo regions with their own replicated Grafana instances. If a single region/zone has trouble we can still access metrics in other regions/zones. This avoids Thanos having any SPoFs for large multi-user(Dev/SRE) orgs.

2 comments

ddreier 1544 days ago

This is one of my favorite things about Thanos. We run Prometheus in multiple private datacenters, multiple AWS regions across multiple AWS accounts, and multiple Azure regions across multiple subscriptions. We have three global labels: cloud, region, and environment. With Thanos's Store/Querier architecture we have a single Datasource in Grafana where we can quickly query any metric from any environment across the breadth of our infrastructure.

It's really a shame that Loki in particular doesn't share this kind of architecture. Seems like Mimir, frustratingly, will share this deficiency.

bboreham 1545 days ago

The typical way to run Mimir is centralised, with different regions/datacenters feeding metrics in to one place. You can run that central system across multiple AZs.

If you run Mimir with an object store (e.g. S3) that supports replication then you can have copies in multiple geographies and query them, but the copies will not have the most recent data.

(Note I work on Mimir)