Yes but we could could look at the amount of of data transmitted in total. Audio compression is well understood, and can infer within an range of usable quality, if any excess voice or other data is sent over the network.
So what you're saying is, if a company like Amazon or Google has the excess bandwidth, it is beneficial for them to send way too much data in the first place in order to disguise what data is actually being sent.
ASR is a hugely complex process that is handled by ML algorithms on Amazon's servers. The echo simply does not have the hardware to handle this on it's own.
Is it though? Not trying to be argumentative but I remember using dragon naturally speaking to do voice dictation way back in like 98 on a processor that makes today's average smartphone look like a supercomputer. I thought all the ML stuff was for figuring out context and the like, but straight transcription?
Now, there is some security basis
http://www.cs.unc.edu/~fabian/papers/tissec2010.pdf
>Uncovering Spoken Phrases in Encrypted Voice over IP Conversations