If you had a language model (say, trained on existing comments from Reddit), you could encode the data in the comments in English, and make the abuse a little more subtle.
That's not entirely true - it only uses monosyllabic words for IPv6 addresses because there's no other way to fit enough bits into the right number of syllables.
For IPv4 addresses, there's loads of space, so I can afford to use some longer words.
I wonder if there are any libraries that can do this? I was thinking of writing a password generator web-app that creates full diceware sentences (TheBlubberyPythonFloatedDownThePurpleFunicular == lots of entropy and easy to remember), but I'd need a decent language model for that. (And I don't feel motivated enough to write my own.)
If you're making full sentences anyway, the grammar of the sentence doesn't need to change much. The vast majority of the entropy is already in the words themselves.
Example sentence (generated by me, not a RNG): "the tiny hairy fish quickly paints a big scary monster".
EDIT: With 10 words, each from the 252 most common words... sentences of this type would have an entropy of more than 10^24 or 2^80. I guess "articles" are pretty much "The" vs "A / An" however, so there really are only 8 words of note...
Well, one, I'd love a more general solution where I could just say "generate a sentence with n bits of entropy" and my algorithm would spin out a sentence of the correct (arbitrary) length. (Hmm... Markov chains?) Or maybe add other mnemonic modifications, like rhymes. And two, I still need an algorithm to conjugate verbs and whatnot, though I suppose that part could just be left to the user. (You get n diceware words — make your own sentence out of them.) But that's boring!
In regards to word commonality, I'm pretty sure you could in fact use something like the 5000 most common words. The people who care about this kind of stuff tend to have large vocabularies!
Ah, true — I was thinking more along the lines of "part of speech" Markov chains, if that's even possible. (As in, just an endless stream of "article noun verb adjective noun adverb conjunction adjective noun verb adjective noun conjunction adjective etc." that could then be mad-libbed by diceware.)
Wikipedia is full of hidden messages. A common pattern I have observed is the first letter of a sentence being used to string together a message. You can read more about this tactic here https://uncyclopedia.wikia.com/wiki/Subliminal_Messages
eg.:
- https://gfycat.com/JaggedIdealFrillneckedlizard
- https://gfycat.com/ThirstyAmbitiousBuzzard
- https://gfycat.com/AlertSpicyBlueandgoldmackaw