|
|
|
|
|
by Jingyi0321
353 days ago
|
|
In general, WebRTC VAD uses pitch information for VAD. Note that pitch only appears in voiced speech, but not in unvoiced speech. With this characteristic, WebRTC VAD may fails in detecting the start of a word, losing the unvoiced start, which will then result in e.g. increased WER in ASR system. On the other hand, noise whose spectrum is similar to voiced speech, e.g. music, may be extracted a non-zero pitch by WebRTC VAD pitch detection system. Our model incorporates fbank and the pitch information together, and can analyse the input pattern deeply, therefore has better performance than WebRTC VAD. |
|