Hacker News new | ask | show | jobs
by woodson 2743 days ago
IMHO, the TIMIT corpus should no longer be used in most application-driven speech recogniton research, as it’s small and completely unrealistic for any real world application. Furthermore, nobody cares about phone error rates, as recognizing phones is not the ultimate goal.

There have been much better, larger datasets available for a long time, for example the Fisher English conversational telephone speech corpus was released in 2004 and contains ~1950h of transcribed speech. There are tons of other datasets in various languages and for various applications (conversational speech, broadcast transcription, etc.).

1 comments

Isn't there some value in being able to bench accoustic models in isolation, no matter how weak they may be, without downstream language models?