Hacker News new | ask | show | jobs
by archagon 4032 days ago
Well, one, I'd love a more general solution where I could just say "generate a sentence with n bits of entropy" and my algorithm would spin out a sentence of the correct (arbitrary) length. (Hmm... Markov chains?) Or maybe add other mnemonic modifications, like rhymes. And two, I still need an algorithm to conjugate verbs and whatnot, though I suppose that part could just be left to the user. (You get n diceware words — make your own sentence out of them.) But that's boring!

In regards to word commonality, I'm pretty sure you could in fact use something like the 5000 most common words. The people who care about this kind of stuff tend to have large vocabularies!

1 comments

I think Markov chains would be a bad idea for the use case of passwords because some words always follow certain words.
Ah, true — I was thinking more along the lines of "part of speech" Markov chains, if that's even possible. (As in, just an endless stream of "article noun verb adjective noun adverb conjunction adjective noun verb adjective noun conjunction adjective etc." that could then be mad-libbed by diceware.)
It is possible (and, I think, a rather clever idea).

You could, for example, use a part-of-speech tagged corpus (a large collection of text where each word was tagged with its PoS by a grad student). Just train a Markov model on the parts of speech instead of the words themselves, and you would be able to generate English-like mad-libs.