Hacker News new | ask | show | jobs
by BjoernKW 4146 days ago
Off the cuff I'd use a phonetic algorithm such as Soundex or Metaphone for mapping strings to abstract phonetic representations. These could then be run through simple regular expression pattern matching. There's a fairly limited set of syllable structures in English that make for pronounceable words.

As with many linguistic algorithms this approach is not language-agnostic though. If you wanted to predict the pronounceability for a language other than English you'd need different algorithms and patterns.

A much more sophisticated approach (as opposed to the heuristic one above) would involve training a Markov model (with characters as states). More probable words (i.e. those containing a likely sequence of characters) are more likely to be easily pronounceable.