|
|
|
|
|
by SkipperCat
1763 days ago
|
|
Dashboards are invaluable. Humans can intake a lot of data from images and there is not better way to grok data than a graph. We've spent a lot of time building Grafana dashboards and they've been extremely helpful with debugging. It doesn't solve all problems but it certainly helps narrow down where to look. Sure, we still look at log files, use htop and a lot of other tools, but our first stop is always Grafana. I suggest the almost any book by Edward Tufte. There you'll see the beauty and value of visual information. |
|
This is what I was about to write. Most of our services have 1 or 2 dashboards showing some service KPIs - for example HTTP request throughputh and response time, and also interface metrics to other sub systems - queries to postgres, messages to the message bus and so on.
With dashboards like this, you can very quickly build a deduction chain of "The customer opened a ticket, well because our 75%ile of the response time went to 20 seconds, well, because our database response times spiked to an ungodly number".
And then you can go to the dashboards about the database, and quickly narrow it down to the subsystem of the database - is something consuming the entire CPU of the database, is the IO blocked, is the network there slow.
In the happy cases, you can go from "The cluster is down" to "our database is blocked by a query" within a minute by looking at a few boards. That's very, very powerful and valuable.
And sure, at that point, the dashboards aren't useful anymore. But a map doesn't lose value because you can now see your target.