| HN Mirror

"Sales quantity are always > 0"; "Ticket number are unique"; "Returns are always linked to an existing ticket"

You know, stuff you usually put in assert statements. Except in real life you don't have strict application of these laws in the data. So you need a way to find out what percentage of your data is "misshapen" and what to do with them. You can't necessarily filter them out, because they might reflect some business process with relevant information. You goal is to easily identify these outliers without crashing your processing (that would be stupid).