| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by loodish 512 days ago

AMR (adaptive multi-rate audio codec) can get down to 4.75 kbit/s when there's low bandwidth available, which is typically what people complain about as being terrible quality.

The speech codecs are complex and fascinating, very different from just doing a frequency filter and compressing.

The base is linear predictive coding, which encodes the voice based on a simple model of the human mouth and throat. Huge compression but it sounds terrible. Then you take the error between the original signal and the LPC encoded signal, this waveform is compressed heavily but more conventionally and transmitted along with the LPC signal.

Phones also layer on voice activity detection, when you aren't talking the system just transmits noise parameters and the other end hears some tailored white noise. As phone calls typically have one person speaking at a time and there are frequent pauses in speech this is a huge win. But it also makes mistakes, especially in noisy environments (like call centers, voice calls are the business, why are they so bad?). When this happens the system becomes unintelligible because it isn't even trying to encode the voice.