Hacker News new | ask | show | jobs
by samstokes 2536 days ago
That's the "monitor from the customer's point of view" approach the OP alludes to. If you use tools like Honeycomb [1] that can easily and routinely answer questions like "show me the 95th percentile latencies for each of the 10 customers experiencing the worst latencies", then situations like you're describing are a lot easier to discover.

[1] https://honeycomb.io. Disclaimer: I used to work for them.

1 comments

>That's the "monitor from the customer's point of view" approach the OP alludes to.

...but now you're in a recursive problem: Who watches the watcher? If the watcher goes down, your insights are gone. Do you devote your entire engineering staff to monitoring, then?

A two-pronged approach would be better: Customer Touch-Point monitoring built into your product and external monitoring should your CTP monitoring go down. If your external monitoring goes down, you still have the CTP, so not all visibility is lost.