|
|
|
|
|
by mlthoughts2018
1904 days ago
|
|
I think your perspective is actually way off the mark here. AI excels at long-tail problems where the cost of failure is high, precisely because human failure is such an expensive problem in those cases and the nature of long tail problems prevents it from being possible to apply QA to every use case. In other words, you know you are forced to deal with getting it wrong a lot and paying the high failure cost, so using a system capable of optimizing that trade-off explicitly is often much better than pretending as if a human in the loop is somehow sparing you the failure costs when they aren’t (and in fact they are simply less efficient than algorithmic solutions). What constitutes a useful sequence of facts in root cause analysis is not just some platonic existing thing. It’s a complex problem involving mind-melting log sleuthing, correlating all kinds of disparate metrics, comparing against timestamps of merges and eventually synthesizing the results. Even seasoned veterans who know systems inside and out struggle with the sheer volume of logs, metrics and facts to compile. And most of the time their approach is based purely on inductive experience with similar incidents combined with heuristics. This is precisely the kind of problem that ML solutions excel at. It has many hallmarks of a good fit and almost none of the hallmarks of “solution in search of a problem” ML over-engineering. |
|
First, you are conflating the underlying log relevance scoring ML system with the GPT-3 summarizing system. ML is a good fit for relevant log identification for the reasons you describe, although characterizing this software as root cause identification is not very accurate in my opinion, based on the examples you can find on their website. But the value of summarizing a log line into natural language is low, while the cost of misleadingly characterizing that log line is high. Whoever needs to debug this system and find the real root cause (e.g. why did the system go OOM?) probably needs certainty more than the convenience and in all likelihood, they are more likely to correctly summarize what the log line says than GPT-3 is (obviously we don't know since there is no evidence, but I don't work with any engineers whose ability to summarize the contents of a log line would be described as "mostly not misleading").
Secondly, I can't agree with this sentence:
> AI excels at long-tail problems where the cost of failure is high, precisely because human failure is such an expensive problem in those cases
Maybe it depends on domain and tech, but in my experience humans don't fail on out-of-sample data nearly as often as AI does. When they do fail, it is often more predictable to other humans and humans inherently have the ability to assign confidence levels to their conclusion which you don't see in many AI models such as GPT-3. Humans are also more effective at applying rules (e.g. common sense) to improve predictions on out-of-sample inputs. I think of "AI is worse than humans at generalizing to out-of-sample" as being a widely held, well-evidenced belief, but I would be interested if you disagree.
For me, the quintessential example is something like traffic light identification, where models generally struggle to identify unseen variants correctly while humans rarely struggle at it. What examples are you thinking of where AI excels at long-trail problems?