Hacker News new | ask | show | jobs
by gopher 6441 days ago
I've created wordlists from Wikipedia database dumps some time ago (http://benjamin-schweizer.de/files/wordlist-wikipedia/); they are pretty large and thus, useful for dictionary attacks. The wordlists are sorted, common words are on top of the lists.

I think that there is a typical password length, so you could improve the sorting based upon a multi-dimensional rating scheme. I'd use expected password length and commonness of a word as factors. Mixing these real words with computer generated words might speed up brute force attacks.

However, I'm not sure how to integrate ordered wordfiles with rainbow tables. Any ideas?

2 comments

I don't think word frequency is a good estimator for the likeliness of passwords. Many frequently used words -- like connectors or adverbs -- are unlikely to be used as passwords. I expect proper names (of people, places, or cultural works) are the most common passwords, which are at a relative disadvantage in word frequency lists.
rainbow tables are an implementation of the time-space tradeoff concept: you are trying to search through a space so large you cannot enumerate it. if you have already enumerated it, as in a wordlist, it is not meaningful to use rainbow tables. it's not a question of how; it's not even a well-defined operation.

that's great that you've made that list though.. i wanted word frequency tables for my startup which is an entirely unrelated type of project. if i hadn't found this i would have compiled it myself; thanks much :)

while your list doesn't have frequencies, i guess i can use the position in the list as a proxy for frequencies. but it's not optimal. any chance you can put up a list which also has the counts?

I don't have a current dump, but if you send me an email, I can give you the script I've written to create those dumps. It prints out the counts.