|
|
|
|
|
by rahimnathwani
2918 days ago
|
|
That codec sounds great, if it exists. If you have such a codec, it would be worth testing the word error rate on a long sample of audio. e.g. take a few hours of call centre recordings, pass them through each of {your codec, codec2}, and then have a human transcribe each of: - the original recording - the audio output from your proposed codec (which presumably does STT followed by TTS) - the audio output from CODEC2 at 2048 Based on the current state of open source single-language STT models, I would imagine that CODEC2 would be much closer to the original. And if the input audio contains two or more languages, I cannot imagine the output of your codec will be useful at all. |
|