Hacker News new | ask | show | jobs
by strags 2309 days ago
I recently needed to encode a 32-bit value into something easy for QA folks to remember and report. I opted for 3 words out of an 11-bit (2048 entry) dictionary of commonly used words.

How to build the dictionary? Well, in order to determine the most commonly used English words, I downloaded a bunch of free texts from Project Gutenberg, and did some simple filtering - nothing less than 5 letters, no duplication of singular + plural, etc...

A valuable lesson that I learned during this process is that when your corpus includes older english texts, you should always give your final list a visual once-over and apply some judicious manual filtering. I'm looking at you, "The Adventures of Tom Sawyer". (And, to a lesser extent, Moby Dick).

2 comments

I like this. Hacked together a quick implementation in javascript (using quickjs as the interpreter):

https://github.com/ratboy666/qjs-3word

In most cases if you need a short list it's better to use something like the diceware or EFF lists than to make your own from scratch.
Or use the BIP39 lists since they also encode 2048 bits. If you just use BIP39 you also get a checksum. RFC 1751[1] is the "standardised" option but IMHO the wordlist they use is far too easy to misread (though this is because the words are all less than 4 characters).

[1]: https://tools.ietf.org/html/rfc1751