| This is a really informative write-up and an excellent learning exercise. It's worth noting that haveibeenpwned's API has a really clever design for allowing people to look up their passwords without transmitting them to the site. It's explained here: https://www.troyhunt.com/ive-just-launched-pwned-passwords-v... The short version is that you can take the first 5 characters of a SHA-1 hash and hit this endpoint: https://api.pwnedpasswords.com/range/21BD1 The endpoint returns (right now) a list of 528 full hashes along with counts. You can compare your full calculated SHA-1 hash to that list to see if the password is present in the dump. The trick here is called k-Anonymity - I think it's a really elegant solution. This technique is written up in more detail here: https://blog.cloudflare.com/validating-leaked-passwords-with... |
E.g. 512-bit SHA-2 and SHA-3 may be truncated at 128 or 256 bits if that’s all the entropy you need (and you don’t need to be compatible with the formal SHA2/SHA3-512/256 spec). Here, CloudFlare is truncating to an intentionally low entropy of just 20 bits, not to reduce the security but rather to intentionally increase the collisions. It’s ultimately just a glorified hash table and as any CS student can tell you, the bucket size is just a function of the hash size (v1 of the api: 128-bit hash, v2 of the api: 20-bit hash).
(I don’t like pretension.)