Hacker News new | ask | show | jobs
by rahimnathwani 2918 days ago
That codec sounds great, if it exists.

If you have such a codec, it would be worth testing the word error rate on a long sample of audio. e.g. take a few hours of call centre recordings, pass them through each of {your codec, codec2}, and then have a human transcribe each of:

- the original recording

- the audio output from your proposed codec (which presumably does STT followed by TTS)

- the audio output from CODEC2 at 2048

Based on the current state of open source single-language STT models, I would imagine that CODEC2 would be much closer to the original. And if the input audio contains two or more languages, I cannot imagine the output of your codec will be useful at all.