Hacker News new | ask | show | jobs
by staticassertion 1477 days ago
Cool, TIL about blocking/ filtering. We do entity resolution such that if we have a "process id" (ex: 1234) in two different event logs we can determine if they're the same process (since pids get reused). We don't have to compare every process to every other process, only ones that share the same pid, which drastically reduces the data size, and then we do filtering on top of that based on other optional attributes.

When I wrote this system I didn't know what ER even was, an article like this would have helped a lot, even just the first line defining ER would have helped a lot.