Hacker News new | ask | show | jobs
by QuadrupleA 2306 days ago
Cool learning exercise and fun read. Can't help but think SQLite could do this screamingly fast and very easily, with nice compact storage (blob primary key with the hashes, without-rowid table to avoid a hidden integer per row).

That said, 49us is very impressive. Hard to beat low level custom coded solutions.

1 comments

> Hard to beat low level custom coded solutions.

Challenge accepted!

I used the following in q to download and load the data into a disk object I could mmap quickly:

    \wget https://downloads.pwnedpasswords.com/passwords/pwned-passwords-sha1-ordered-by-hash-v5.7z
    \7z -so e pwned-passwords-sha1-ordered-by-hash-v5.7z pwned-passwords-sha1-ordered-by-hash-v5.txt | cut -c1-40 | xxd -r -p > hibp.input
    `:hibp 1: `s#0N 20#read1 `:hibp.input
This took about an hour to download, an hour to 7z|cut|xxd, and about 40 minutes to bake. At complete, I have an on-disk artefact in kdb's native format. I can load it:

    q)hibp:get`:hibp; / this mmaps the artefact almost instantly
and I can try to query it:

    q)\t:1000 {x~hibp[hibp bin x]} .Q.sha1 "1234567890"
    5
Now that's 1000 runs taking sum 5msec, or 5µsec average lookup time! It's entirely possible my MacBook Air is substantially faster than the authors' machine, but I think being ten times slower than an "interpreted language" suggests there's a lot of room to improve!