|
|
|
|
|
by jiggawatts
217 days ago
|
|
The sorting is the slowest step by far. Hashing is so fast that you can hand-wave it away as zero cost relative to the time taken to read such a large amount of data. Also, you only have to do it once for the whole input, which means that it's O(n) time where 'n' is the gigabytes of passwords you have. Sorting is going to need about O(n * log n) time even if it's entirely in memory, but more if it has to spool to disk storage then it'll take much longer than the hashing step. PS: I just realised that 2 billion passwords is not actually that much data -- only 40 GB of hashes -- that's well within the range of what's "easy" to sort in-memory by simply creating an array of hashes that size and calling a standard library sort function. |
|