|
|
|
|
|
by user5994461
2159 days ago
|
|
You're lucky to have a team. I was doing that with another person in our spare time and got 0 hour/week to spend. I only give ourselves a couple weeks to setup new software and tune everything perfectly then it has to be done for good. Will check maybe once a quarter for capacity adjustment or software upgrade. (I also supervise a logging system ingesting 1 TB/day with no supervision). I should probably say that my experience with datadog goes back as far as 5 years ago. Already had monitoring working perfectly back then, when prometheus didn't exist let alone the exporter plugins! So prometheus is really late and sub par to me. ^^ Looks like we got a similar amount of data in prometheus as you (1TB for 60 days) but with 40% of the hosts. Maybe you have many small VM? Got physical hosts with quad CPU (per CPU metrics) and network interfaces and stuff (couldn't tune the node exporter to ignore disabled interfaces and some useless devices). Check how many distinct timeseries you have, prometheus_tsdb_head_series. Datadog had amazing support for custom metrics (but watch out for extra billing and cardinality!). Applications can just send metrics to localhost:1234 where the agent is listening, and they're enriched automatically with host information and environment. Magic. This reminds me, prometheus is broken with its idea of pulling metrics, when metrics should be pushed instead. Applications and hosts have to push metrics when they come online, it's not the responsibility of the metrics storage to know about every goddamn thing running in the company and try to talk to them (can't cross firewall anyway). Prometheus worked okay enough for the last company that was on premise with fixed hosts (weeks or months to move anything physical), but it's de facto broken for the previous company that was on AWS with instances created intraday. |
|
We disabled a lot of useless metrics in the node exporter. I think pulling works OK if you have a service discovery mechanism. We hook Prometheus to Consul.
We have a mix of very small VMs and very beefy bare metal.