Hacker News new | ask | show | jobs
by runnel 4146 days ago
Nice. I wonder what methods and tools did you use for defining pronouncability. Care to share some tips? Just curious about language-oriented programming.
4 comments

Off the cuff I'd use a phonetic algorithm such as Soundex or Metaphone for mapping strings to abstract phonetic representations. These could then be run through simple regular expression pattern matching. There's a fairly limited set of syllable structures in English that make for pronounceable words.

As with many linguistic algorithms this approach is not language-agnostic though. If you wanted to predict the pronounceability for a language other than English you'd need different algorithms and patterns.

A much more sophisticated approach (as opposed to the heuristic one above) would involve training a Markov model (with characters as states). More probable words (i.e. those containing a likely sequence of characters) are more likely to be easily pronounceable.

I'm not entirely sure it is functioning correctly, I'm struggling to pronounce:

100% thlla.com 100% sohyw.com

etc. I like the idea though.

sohyw I would imagine would be pronounced "so hew" so according to the algorithm it's pronounceable

but really the domain name I want is one that is SPELLABLE, not pronounceable

Not just that, you need it to be unambiguous. So "gitmy" (one that's mentioned below) is pretty dodgy, because it might sound too much like "getmy".
ryjof.com

100% pronounceable.

It appears to me (from my quick glance through it) that all of the domains follow the pattern of alternating consonants and vowels (with an additional rule which is that it allows for two identical consonants in a row, e.g. "mm" or "ll"). Combinations of letters which alternate between vowels and consonants are typically at least somewhat pronounceable.
I also wonder what differences there are between a word that is, say 80% pronounceable, and one that is 60% pronounceable. Is it not a binary thing?