|
|
|
|
|
by samstokes
2536 days ago
|
|
That's the "monitor from the customer's point of view" approach the OP alludes to. If you use tools like Honeycomb [1] that can easily and routinely answer questions like "show me the 95th percentile latencies for each of the 10 customers experiencing the worst latencies", then situations like you're describing are a lot easier to discover. [1] https://honeycomb.io. Disclaimer: I used to work for them. |
|
...but now you're in a recursive problem: Who watches the watcher? If the watcher goes down, your insights are gone. Do you devote your entire engineering staff to monitoring, then?
A two-pronged approach would be better: Customer Touch-Point monitoring built into your product and external monitoring should your CTP monitoring go down. If your external monitoring goes down, you still have the CTP, so not all visibility is lost.