Hacker News new | ask | show | jobs
by abotsis 1520 days ago
To give some further context, I stumbled across this thought while reading how “community ids” are calculated. Community ids are commonly used to simplify joining/lookups for network security tools (suracata, zeek). They essentially concatenate the “quad tuple” (src ip/port, dest ip/port), and a “seed”, then run sha against it. I didn’t entirely understand the reason the authors chose sha (other than being security people who might have just reached for a crypto secure hash function). SHA is slow vs something like xxh, and given the number of sessions these things process, seemed like overkill. Further, it’s unclear to me what’s gained by using sha vs xxh or simply concatenating the bits. Then I started wondering about the downsides: whether it’s possible to have a false correlation because two sessions yielded the same sha digest.