|
|
|
|
|
by geocar
2306 days ago
|
|
> Hard to beat low level custom coded solutions. Challenge accepted! I used the following in q to download and load the data into a disk object I could mmap quickly: \wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-sha1-ordered-by-hash-v5.7z
\7z -so e pwned-passwords-sha1-ordered-by-hash-v5.7z pwned-passwords-sha1-ordered-by-hash-v5.txt | cut -c1-40 | xxd -r -p > hibp.input
`:hibp 1: `s#0N 20#read1 `:hibp.input
This took about an hour to download, an hour to 7z|cut|xxd, and about 40 minutes to bake. At complete, I have an on-disk artefact in kdb's native format. I can load it: q)hibp:get`:hibp; / this mmaps the artefact almost instantly
and I can try to query it: q)\t:1000 {x~hibp[hibp bin x]} .Q.sha1 "1234567890"
5
Now that's 1000 runs taking sum 5msec, or 5µsec average lookup time! It's entirely possible my MacBook Air is substantially faster than the authors' machine, but I think being ten times slower than an "interpreted language" suggests there's a lot of room to improve! |
|