Hacker News new | ask | show | jobs
by user5994461 2163 days ago
Alright. More data in total but less per server because it's distributed.

We run a pair of servers both storing everything. It's the least that can be done to have any resiliency.

I would love to distribute the data, preferably per continent, but prometheus didn't have a good story on sharding. Running independent dataset is worthless in practice without the ability to aggregate. Also, the more servers the more expensive (and they're not easy to procure). Running 6 prometheus servers is in the same ballpark as paying for datadog, so might as well just pay for it.

1 comments

We don't care so much about resiliency, we do backup the prometheus folder using the snapshot api.

There are a couple of articles about sharding and federation with prometheus, dunno if they existed when you tried it.

For us our problems are usually local to a datacenter. Having a dropdown where you can pick the datacenter has proven good enough. It is unlikely that we have a global issue in a service.

Sorry if unclear but we have our own datacenters, our prometheus VMs are essentially free in the grand scheme of things considering the number of compute we have.