It's fundamentally very similar to the sorts of issues you end up with if you compress then encrypt. If the attacker can make some educated guesses about the plaintext prior to the compression, the compression ratio can be a very powerful tool in their arsenal.
On the other hand, if you could do it, you'd probably have invented a convoluted speech-to-text (where text is a index into a dictionary of words). Note that you would also likely lose things like inflection, voice, accent etc - so while it might work as a texting system with voice input - it would be a poor substitute for voice chat..
Do you have a link discussing this?