Hacker News new | ask | show | jobs
by nmfisher 1016 days ago
Lyra is a real-time neural speech codec from Google - I don't know if they use it in the Pixel line for call compression, but they certainly could.

Interestingly, I had the idea of using their open-source version as a vocoder for a light-weight TTS model. It did work - as in, it produced intelligible speech - but with very rough audio quality on the validation set. No matter what I tweaked, after 1-2 epochs the validation error would always diverge from the training error, which to me suggests considerable redundancy in the compressed representations (i.e two clips of perceptually similar audio can decode to different representations, so the TTS model has difficulty learning the underlying loss surface). I suspect there's still a lot more entropy to be squeezed out of it The Encodec authors encountered something similar, compressing their codec by a further 40% by simply layering a language model over the top.