Hacker News new | ask | show | jobs
by krullie 3647 days ago
I've deployed almost exactly this on a new kubernetes cluster running coreos exclusively with 4 nodes, 3 masters and 3 etcd instances divided over 4 physical machines.

Things I haven't implemented yet is deploying the node-exporters via DaemonSets and the Prometheus config through a ConfigMap. Those are right now done through a cloud-config systemd override and a gluster mount.

Couple of things I ran into. The kubelet's running the kubernetes master components need access to the ssl certificates of the apiserver, otherwise they cannot be scraped by prometheus over https when they communicate with the apiserver over https. And I'm still very confused about a seemingly simple thing of getting a per request response time query. This is what I'm using now:

`irate(request_processing_seconds_sum{app="myapp", method=~"$method"}[5m]) / irate(request_processing_seconds_count{app="myapp", method=~"$method"}[5m])`

Two things: I'm confused by what I should set the vector time to `[5m]`. And how I can get the response time of an individual request. We've had a request take 30 seconds but that spike only showed up when viewing the graph over a 3 hour time period. When viewing it over 6 hours or 1 hour it simply will not show up even though it happend in the last hour.

Things I do really love is the ability of configuring services to be scraped by setting annotations to it. Works great when slowly transitioning your services to prometheus style metrics!

If any body has some pointers about querying and visualizing using grafana and prometheus that would be great!