| HN Mirror

defmodule Pwned do def load do :ets.new(:table, [:named_table, :set]) File.stream!("pwned-passwords-sha1-ordered-by-hash-v5.txt") |> Stream.each(fn line -> <<hash::binary-size(40), ":", _rest::binary>> = String.trim_trailing(line) hash = :crypto.hash(:md5, Base.decode16!(hash)) :ets.insert(:table, {:binary.copy(hash), true}) end) |> Stream.run() end def lookup_hash(hash) do hash = :crypto.hash(:md5, Base.decode16!(hash)) case :ets.lookup(:table, hash) do [] -> false _ -> true end end def lookup_password(password) do lookup_hash(:crypto.hash(:sha, password) |> Base.encode16()) end end

I agree these performance numbers don't seem great. Using q (another interpreted language; not compiled) I get 5µsec on my Macbook Air:

Here's my load script:

    \wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-sha1-ordered-by-hash-v5.7z
    \7z -so e pwned-passwords-sha1-ordered-by-hash-v5.7z pwned-passwords-sha1-ordered-by-hash-v5.txt | cut -c1-40 | xxd -r -p > hibp.input
    `:hibp 1: `s#0N 20#read1 `:hibp.input

I can then shut down this process, and start a new one:

    q)hibp:get`:hibp; / this mmaps the artefact almost instantly
    q)\t:1000 {x~hibp[hibp bin x]} .Q.sha1 "1234567890"
    5

It's so fast I need to run it 1000 times to take just 5msec (5µsec average lookup time!). I imagine converting to md5 would be substantially faster since there's a 16-byte scalar type in q I would be able to use.