| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by spobin 3150 days ago
	Surely it's possible to sniff what's being sent back to Amazon's servers? If Amazon are lying and they are storing/analysing everything the Echo hears, surely this would be easy to prove?

5 comments

blixt 3149 days ago

It is possible to see all traffic it sends, and possibly even fake certificate authorities (depends on how resilient the Alexa is to this tampering) and trick the Alexa into giving you the data it sends encrypted using a key that you control.

However, this line of reasoning can be refuted all the way down to being impossible to prove/disprove. For example, there is reasonably an audio processing chip in Alexa that does always-on keyword listening, and it's possible it could track breadcrumbs over time (e.g., voice fingerprints, triggering keywords like "bomb", etc). This data can then be interlaced with innocuous data, for example inside an access token (opaque blob used to identify on whose behalf the Alexa is making requests). That would make it virtually impossible to find even if you had full access to the network traffic.

Anyway, when it comes to these things I like to take an Occam's razor approach. There's a great number of things a company can do to spy on you, but most likely when it comes to mass surveillance it's easier to tap into more obvious sources of data like your browsing history from the ISP, your phone line, Facebook/Google tracking data. In fact, I'd be more scared of say Facebook's and Google's voice assistants than Amazon or Apple because the latter two don't depend as much on consumer identity as a business.

link

majewsky 3150 days ago

Strong encryption is a thing.

EDIT: Another thing that just came to my mind. Even when you analyze network traffic and observe that traffic only occurs during your queries (i.e. in the seconds after the hotword is uttered), that doesn't mean that the Echo won't use the opportunity to send some previously-recorded audio to the server together with the current recording. In the same way that clever hackers disguise themselves by having their network traffic mimic the shape and direction of legitimate network traffic.

link

johnjac 3150 days ago

Yes but we could could look at the amount of of data transmitted in total. Audio compression is well understood, and can infer within an range of usable quality, if any excess voice or other data is sent over the network.

link

pixl97 3149 days ago

So what you're saying is, if a company like Amazon or Google has the excess bandwidth, it is beneficial for them to send way too much data in the first place in order to disguise what data is actually being sent.

Now, there is some security basis

http://www.cs.unc.edu/~fabian/papers/tissec2010.pdf

>Uncovering Spoken Phrases in Encrypted Voice over IP Conversations

link

krisroadruck 3149 days ago

Assuming its sending it as audio, and not as transcribed text which is both smaller and also much more compressible.

link

Analog24 3149 days ago

ASR is a hugely complex process that is handled by ML algorithms on Amazon's servers. The echo simply does not have the hardware to handle this on it's own.

link

krisroadruck 3149 days ago

Is it though? Not trying to be argumentative but I remember using dragon naturally speaking to do voice dictation way back in like 98 on a processor that makes today's average smartphone look like a supercomputer. I thought all the ML stuff was for figuring out context and the like, but straight transcription?

link

nerpderp83 3149 days ago

Modern voice codecs are extremely compact. An annotated text representation of voice will take up equivalent space.

link

UncleMeat 3149 days ago

You own the client. You can break any crypto it is doing.

link

beckler 3150 days ago

I'm sure you could use Wireshark and see what requests are being made, however, they very likely use TLS so getting the content of those requests would be extremely difficult if not completely impossible.

However, if you don't mind potentially destroying your echo, I'm sure you could reverse engineer a way to see what's going on.

link

detaro 3150 days ago

As far as I know, only this year's Echo models don't have a known way to root them, so you could likely circumvent the encryption on an older model to inspect traffic. I'm not aware of any publicized results of someone doing that though, and it doesn't necessarily tell you what the backend can and can't extract from the audio data.

link

johnjac 3150 days ago

https://www.iot-tests.org/2017/06/careless-whisper-does-amaz...

link

johnjac 3150 days ago

http://www.vnutz.com/articles/Idle_Network_Activity_of_an_Am...

link