|
|
|
|
|
by kasey_junk
3424 days ago
|
|
> A histogram of the duration it took to serve a response to a request, also labelled by successes or errors. This is so much easier said than done. Most time series db that people use to instrument things quite simply cannot handle histogram data correctly. They make incorrect assumptions about the way roll-ups can happen or they require you to be specific about resolution requirements before you can know them well. Then histogram data tends to be very expensive to query so it bogs down preventing you from making the kinds of queries that are really valuable for diagnosing performance regressions. Finally, the visualization systems for histograms are really difficult because you need a third dimension to see them over time. Heat maps accomplish this but are hard to read at times and most dashboard systems don't have great visualization options for "show this time period next to this time period" which is an incredibly common requirement when comparing latency histograms. |
|
We don't have the visualizations for histograms yet (though you can chart specific percentiles), but for the reasons you mention, Honeycomb is perfectly suited to give you that kind of data. I can't say we'll get that out the door soon, but it's one of my pet most wanted features so as soon as I can convince myself it's actually more important than all the other mountain of things that need to get done, you'll get your histograms and your time over time comparisons.
I've been advocating for a heat map style presentation of histograms for a long time, but I hadn't considered the difficulty that creates when trying to show time over time. That's an interesting one to noodle on.
Thanks for articulating well the value and reasons for difficulty in implementing histograms!
(bias alert - I work on Honeycomb)