| > maybe I'm fundamentally missing something about how I'm "supposed to work", but honestly all I have ever wanted to do, when looking at logs, is see the log from one process, from beginning to end, as a text file. This is still a valid use case but pretend for a minute you have thousands or millions of log lines to inspect. Even after filtering for ERROR level only, you still have too many "those are normal" errors, devs swear (but do not fix). And maybe the data you need to diagnose isn't even in ERROR! The solution? Use log queries to compare a normal and abnormal process or cluster, group them by some kind of fingerprint, then apply some Laplace smoothing or other bayesian techniques to score fingerprints by strength of association with abnormal. This lets me rapidly identify problems at scale that would otherwise take hours of pouring through logs to exclude stuff by hand. This works any time you can divide logs into "good" and "bad." Example scenarios: - canary analysis, comparing canary and baseline - single faulty pod in a deploy, comparing the bad container to the n good ones - one AZ or region in a multi-region deploy - now versus yesterday, or versus an hour ago, etc - Android versus iPhone |