Surely it's possible to sniff what's being sent back to Amazon's servers? If Amazon are lying and they are storing/analysing everything the Echo hears, surely this would be easy to prove?
It is possible to see all traffic it sends, and possibly even fake certificate authorities (depends on how resilient the Alexa is to this tampering) and trick the Alexa into giving you the data it sends encrypted using a key that you control.
However, this line of reasoning can be refuted all the way down to being impossible to prove/disprove. For example, there is reasonably an audio processing chip in Alexa that does always-on keyword listening, and it's possible it could track breadcrumbs over time (e.g., voice fingerprints, triggering keywords like "bomb", etc). This data can then be interlaced with innocuous data, for example inside an access token (opaque blob used to identify on whose behalf the Alexa is making requests). That would make it virtually impossible to find even if you had full access to the network traffic.
Anyway, when it comes to these things I like to take an Occam's razor approach. There's a great number of things a company can do to spy on you, but most likely when it comes to mass surveillance it's easier to tap into more obvious sources of data like your browsing history from the ISP, your phone line, Facebook/Google tracking data. In fact, I'd be more scared of say Facebook's and Google's voice assistants than Amazon or Apple because the latter two don't depend as much on consumer identity as a business.
EDIT: Another thing that just came to my mind. Even when you analyze network traffic and observe that traffic only occurs during your queries (i.e. in the seconds after the hotword is uttered), that doesn't mean that the Echo won't use the opportunity to send some previously-recorded audio to the server together with the current recording. In the same way that clever hackers disguise themselves by having their network traffic mimic the shape and direction of legitimate network traffic.
Yes but we could could look at the amount of of data transmitted in total. Audio compression is well understood, and can infer within an range of usable quality, if any excess voice or other data is sent over the network.
So what you're saying is, if a company like Amazon or Google has the excess bandwidth, it is beneficial for them to send way too much data in the first place in order to disguise what data is actually being sent.
ASR is a hugely complex process that is handled by ML algorithms on Amazon's servers. The echo simply does not have the hardware to handle this on it's own.
Is it though? Not trying to be argumentative but I remember using dragon naturally speaking to do voice dictation way back in like 98 on a processor that makes today's average smartphone look like a supercomputer. I thought all the ML stuff was for figuring out context and the like, but straight transcription?
I'm sure you could use Wireshark and see what requests are being made, however, they very likely use TLS so getting the content of those requests would be extremely difficult if not completely impossible.
However, if you don't mind potentially destroying your echo, I'm sure you could reverse engineer a way to see what's going on.
As far as I know, only this year's Echo models don't have a known way to root them, so you could likely circumvent the encryption on an older model to inspect traffic. I'm not aware of any publicized results of someone doing that though, and it doesn't necessarily tell you what the backend can and can't extract from the audio data.
However, this line of reasoning can be refuted all the way down to being impossible to prove/disprove. For example, there is reasonably an audio processing chip in Alexa that does always-on keyword listening, and it's possible it could track breadcrumbs over time (e.g., voice fingerprints, triggering keywords like "bomb", etc). This data can then be interlaced with innocuous data, for example inside an access token (opaque blob used to identify on whose behalf the Alexa is making requests). That would make it virtually impossible to find even if you had full access to the network traffic.
Anyway, when it comes to these things I like to take an Occam's razor approach. There's a great number of things a company can do to spy on you, but most likely when it comes to mass surveillance it's easier to tap into more obvious sources of data like your browsing history from the ISP, your phone line, Facebook/Google tracking data. In fact, I'd be more scared of say Facebook's and Google's voice assistants than Amazon or Apple because the latter two don't depend as much on consumer identity as a business.