|
|
|
|
|
by nextts
464 days ago
|
|
Funny I was thinking this week logging needs some magic. Log diving takes a lot of time especially during some kind of outage/downtime/bug where the whole team might be watching a screen share of someone diving into logs. At the same time I am sceptical about "AI" especially if it is just an LLM stumbling around. Understanding logs is probably the most brain intensive part of the job for me, more so than system design, project planning or coding. This is because you need to know where the code is logging, imagine code paths in your head and you constantly see stuff that is a red herring or doesn't make sense. I hope you can improve this space but it won't be easy! |
|
As for the skepticism with LLMs stumbling around raw logs: it's super deserved. Even the developers who wrote the program often refer to larger app context when debugging, so it's not as easy as throwing a bunch of logs into an LLM. Plus, context window limits & the relative lack of "understanding" with increasingly larger contexts is troublesome.
We found it helped a lot to profile application logs over time. Think aggregation, but for individual flows rather than similar logs. By grouping and ordering flows together, it's bringing the context of thousands of (repetitive) logs down to the core flows. Much easier to find when things are out of the ordinary.
Still a lot of improvements in regards to false positives and variations in application flows.