| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by daanzu 2176 days ago

Gathering, collecting, and publishing such a dataset would be great, and would certainly much improve the baseline speech recognition for people with disordered speech, but it can only help so much without personalizing to a specific individual. This is true for anybody, but more so for disordered speech. This is an area where I think "generic" solutions will inevitably struggle, even if they are somewhat specialized on "generic disarthritic" speech.

However, this means that the gains to be had from personalized training are greater for disordered speech than for "average" speech. I develop kaldi-active-grammar [0], which specializes the Kaldi speech recognition engine for real-time command & control with many complex grammars. I am also working on making it easier to train personalized speech models, and to fine tune generic models with training for an individual. I have posted basic numbers on some small experiments [1]. Such personalized training can be time consuming (depending on how far one wants to take it), but as my parent comment says, disabled people may need to rely more on ASR, which means they have that much more to gain by investing the time for training.

Nevertheless, a Common Voice disordered speech dataset would be quite helpful, both for research, and for pre-training models that can still be personalized with further training. It is good to see (in my sibling comment) that it is being discussed.

[0] https://github.com/daanzu/kaldi-active-grammar

[1] https://github.com/daanzu/kaldi-active-grammar/blob/master/d...