Hacker News new | ask | show | jobs
by viraptor 2306 days ago
This is pretty much the case for probabilistic membership functions. It's kind of a broken bloom filter where you take only the front few bits without hashing. I wouldn't expect someone to come up with that under the interview stress, but if you know a few basic algorithms, this should be pretty simple/familiar.

It's also similar to how we spread mail folders in the past to avoid large directories. Or how Debian splits repo by package prefix: http://ftp.us.debian.org/debian/pool/main/

1 comments

Exactly and good examples. I believe the general concept is just sharding, for anyone that wants to google it further. Typically naive sharding on the first few bytes is useless because it results in uneven distributions but (and this is what I mean when I said SHA was designed for this) a cryptographic hash should result in a uniform random-appearing distribution across all its bits, meaning the resulting hash is uniquely suitable for basic sharding by prefix.

But then they wouldn’t have been able to give it a fancy name if they called it what it is.