|
|
|
|
|
by baby_souffle
664 days ago
|
|
> you can detect stalled metrics (per host or service), who didn't send the data on time, etc I guess the difference here is that we leverage service discovery in Prometheus for this instead of having to externally build an authoritative list of who/what should have pushed metrics. > <...> and wait for a response. As opposed to waiting for $thing to push metrics to you? I guess I'm not convinced that one architecture is obviously better? There might be some downsides to a particular implementation but generally they both work and only external constraints will dictate which you use? E.g.: if you're required to ship metrics to multiple places, pushing to graphite and datadog becomes easier. Anything that _should_ be scraped is tagged a certain way and anything that doesn't respond to a scrape gets flagged. After a few flags, an operator is paged. When $thing is destroyed or re-provisioned, different tags lead to a different set of $things to scrape metrics from. |
|