|
|
|
|
|
by fragmede
891 days ago
|
|
This is easy for me to say in hindsight, but a graph of queries per user, for the top 100 accounts would have lit up the issue like a Christmas tree. But it's the kind of chart you stuff on the full page of charts that are usually not interesting. So many problems have been avoided by "huh, that looks weird" on a dashboard. In a more targeted search, asking Splunk to show out where the traffic is coming from on a map, and a pie chart of the top 10 accounts they're coming from would also be illuminating. But again, this is easy for me to say in hindsight, after the issue has been helpfully explained to me in a blog post. I don't claim could have done any better if I were there, my point is that you need to see inside the system to be operate it well. |
|
Drill down by user, locale, etc would just make it even easier to figure out what’s going on once you spot the spike and start to dig.
My unsolicited advice for Zac in case they’re reading is — start thinking about SLIs (I for indicators) sooner rather than later to help you think through cases like this.