|
using ETS with Erlang (or Elixir) I get sub 50μs (30μs on avergae) lookup times. Memory usage is quite high, around 95 bytes per element, bu I'm sure that by spending more than 5 minutes on it, like I did, one can take it down considerably For reference, this is the code I used I converted the SHA hashes to MD5 to save memory, given we don't care about collisions (which are very unlikely anyway), we just want to know if the password was there or not. defmodule Pwned do
def load do
:ets.new(:table, [:named_table, :set])
File.stream!("pwned-passwords-sha1-ordered-by-hash-v5.txt")
|> Stream.each(fn line ->
<<hash::binary-size(40), ":", _rest::binary>> = String.trim_trailing(line)
hash = :crypto.hash(:md5, Base.decode16!(hash))
:ets.insert(:table, {:binary.copy(hash), true})
end)
|> Stream.run()
end
def lookup_hash(hash) do
hash = :crypto.hash(:md5, Base.decode16!(hash))
case :ets.lookup(:table, hash) do
[] -> false
_ -> true
end
end
def lookup_password(password) do
lookup_hash(:crypto.hash(:sha, password) |> Base.encode16())
end
end
|
Here's my load script:
I can then shut down this process, and start a new one: It's so fast I need to run it 1000 times to take just 5msec (5µsec average lookup time!). I imagine converting to md5 would be substantially faster since there's a 16-byte scalar type in q I would be able to use.