|
|
|
|
|
by sethammons
1000 days ago
|
|
Splunk is hands down the best log analysis tooling I've used. If not for the hefty price tag, I'd use it for my personal stuff and every workplace I've been. Structured logs and Splunk are the stuff dreams are made of if you care about monitoring the quality of software. The logs into metrics abilities along with the ability to unlock finding relationships in data is amazing. Mouse over the fields found in logs matching your search and see the top N values for other these keys. Imagine getting an alert and being able to search your logs for that error message and immediately being able to see it affects these N users disproportionally, that it is split 50/50 in two of your seven regions, only affects version X of your service. A couple more searches to dig in and you can see it is only feature Y with setting Z that is the problem. You switch to a timechart view and can see the moment the error started and the affected user counts. A few more minutes and your support team has a list of known affected users. You decide to monitor this new feature so you quickly create a new dashboard (or panel on an existing dashboard) and a new alert. At no time did you have to declare a field of your structured logs as an index or as searchable or aggregatable. |
|
We used Splunk to associate a change request ticket number all the way through the change control process to the Puppet log output tagging each change to the original business purpose.
It was like magic for auditors back then and I rarely see that depth of tracing automated changes to business purpose in the field today, though we get close with gitops.