Hacker News new | ask | show | jobs
by jacobharris 4821 days ago
Yes, to clarify, I started with the base CMUdict for syllable counts, but I had the program keep track of any term misses it ran into. This way I could augment its vocabulary. It also helped me find some tokenization bugs and also try some rules for dealing with compound words like "unsportsmanlike"
1 comments

One approximate hack that works pretty well is to count the number of blocks of vowels separated by consonants. It breaks on some words, but was close enough to use for something I was working on. (Datamining rhymes from lyrics.)