Nice. I wonder what methods and tools did you use for defining pronouncability. Care to share some tips? Just curious about language-oriented programming.
Off the cuff I'd use a phonetic algorithm such as Soundex or Metaphone for mapping strings to abstract phonetic representations. These could then be run through simple regular expression pattern matching. There's a fairly limited set of syllable structures in English that make for pronounceable words.
As with many linguistic algorithms this approach is not language-agnostic though. If you wanted to predict the pronounceability for a language other than English you'd need different algorithms and patterns.
A much more sophisticated approach (as opposed to the heuristic one above) would involve training a Markov model (with characters as states). More probable words (i.e. those containing a likely sequence of characters) are more likely to be easily pronounceable.
It appears to me (from my quick glance through it) that all of the domains follow the pattern of alternating consonants and vowels (with an additional rule which is that it allows for two identical consonants in a row, e.g. "mm" or "ll"). Combinations of letters which alternate between vowels and consonants are typically at least somewhat pronounceable.
As with many linguistic algorithms this approach is not language-agnostic though. If you wanted to predict the pronounceability for a language other than English you'd need different algorithms and patterns.
A much more sophisticated approach (as opposed to the heuristic one above) would involve training a Markov model (with characters as states). More probable words (i.e. those containing a likely sequence of characters) are more likely to be easily pronounceable.