Hacker News new | ask | show | jobs
by joefreeman 4032 days ago
If you had a language model (say, trained on existing comments from Reddit), you could encode the data in the comments in English, and make the abuse a little more subtle.
4 comments

This is also how hipku stores ip addresses as haiku

demo - http://hipku.gabrielmartin.net

explanation - http://gabrielmartin.net/projects/hipku/

No wonder why this is everywhere:

  The hungry white ape

  aches in the ancient canyon.

  Autumn colors crunch.
https://www.google.com/search?q=%22%22The+hungry+white+ape+a...
According to the overview on the of hipku website, it uses only monosyllabic words, unlike this haiku.
That's not entirely true - it only uses monosyllabic words for IPv6 addresses because there's no other way to fit enough bits into the right number of syllables.

For IPv4 addresses, there's loads of space, so I can afford to use some longer words.

I wonder if there are any libraries that can do this? I was thinking of writing a password generator web-app that creates full diceware sentences (TheBlubberyPythonFloatedDownThePurpleFunicular == lots of entropy and easy to remember), but I'd need a decent language model for that. (And I don't feel motivated enough to write my own.)
Why not "Article Adjective Adjective Noun Adverb Verb Article Adjective Adjective Noun"?

If you're making full sentences anyway, the grammar of the sentence doesn't need to change much. The vast majority of the entropy is already in the words themselves.

Example sentence (generated by me, not a RNG): "the tiny hairy fish quickly paints a big scary monster".

EDIT: With 10 words, each from the 252 most common words... sentences of this type would have an entropy of more than 10^24 or 2^80. I guess "articles" are pretty much "The" vs "A / An" however, so there really are only 8 words of note...

Sorry to double comment, but this is exactly how my hobby project hipku works

http://hipku.gabrielmartin.net

Well, one, I'd love a more general solution where I could just say "generate a sentence with n bits of entropy" and my algorithm would spin out a sentence of the correct (arbitrary) length. (Hmm... Markov chains?) Or maybe add other mnemonic modifications, like rhymes. And two, I still need an algorithm to conjugate verbs and whatnot, though I suppose that part could just be left to the user. (You get n diceware words — make your own sentence out of them.) But that's boring!

In regards to word commonality, I'm pretty sure you could in fact use something like the 5000 most common words. The people who care about this kind of stuff tend to have large vocabularies!

I think Markov chains would be a bad idea for the use case of passwords because some words always follow certain words.
Ah, true — I was thinking more along the lines of "part of speech" Markov chains, if that's even possible. (As in, just an endless stream of "article noun verb adjective noun adverb conjunction adjective noun verb adjective noun conjunction adjective etc." that could then be mad-libbed by diceware.)
To make it less obvious, have a few other kinds of sentence. But make those kinds clearly differentiated by length.
It would be cool if they biased the name generation with descriptions using some of the recent work in high quality machine image tagging.

Throw in some ambiguous adjectives and you should have a large enough namespace that matches up with common image contents.

See my other comment on this post for an example of this (encoding data using Markov chains)
Wikipedia is full of hidden messages. A common pattern I have observed is the first letter of a sentence being used to string together a message. You can read more about this tactic here https://uncyclopedia.wikia.com/wiki/Subliminal_Messages
Also in 2010 there was a file storage system based on Bitly: https://nealpoole.com/blog/2010/12/bit-ly-file-storage-cleve... Super clever idea, except for the part where Bitly now sprinkles links with affiliate codes to make $$$