| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Flammy 3464 days ago

> It would cost more money than God in hardware to store every thing Alexa ever heard

Depends, first of all storing compressed audio isn't that space-expensive, especially in some long term data storage like s3. Additionally they could only be storing the transcriptions, but not the voice behind them, which would be a lot less data.

We don't know as Amazon hasn't been very forthcoming about the privacy aspects of Alexa. I personally suspect they are keeping some voice information so they can use it to improve their NLP. I hope they are doing so in a way that is detached from accounts / IDs, but you never know.

Additionally, you can indeed delete a record of the query from the app, but who knows if the voice data or even the query itself is still stored after deletion, just not visible to us end users.

3 comments

troncheadle 3464 days ago

> but who knows if the voice data or even the query itself is still stored after deletion, just not visible to us end users.

Almost definitely yes. I've never known a tech company that truly deletes anything

link

api 3464 days ago

"Never really delete" is actually standard advice. There are loads of reasons, mostly non-nefarious, why you may want or need that data.

Sometimes deleted stuff is archived offline or in slow warehouse databases that are not live, etc.

link

wccrawford 3464 days ago

If it stored everything (and not just requests after the watch word) then it would end up trying to store audio or transcriptions of so many hours of tv and random conversations that it would be ridiculous. And that's just my house. I imagine most people have one somewhere near a TV, and it would do the same.

link

Flammy 3464 days ago

I'll point out Facebook is using this always on recording for advertising purposes and one of those is to fuel a nielsen-like TV/movie/audio popularity business.

Basically, Facebook's always-on audio listening on their mobile app (Messenger I believe, but might be both these days) was giving this data. I can't remember the name of the company, but here is another tech company doing the same:

> Symphony uses just one: an app, downloaded to the cellphones of its more than 15,000 panelists. Audio recognition software then picks up whatever people are tuning into, wherever they’re tuning into it: their TV sets, their laptops, or their smartphones. “[It] measures everything you want to measure from one approach,” says Bill Harvey, a media research consultant who’s worked with Symphony

https://theringer.com/tv-ratings-streaming-nielsen-symphony-...

link

mysterydip 3464 days ago

I would think it would be possible and even beneficial to dedupe the data (15m homes x NFL broadcast, for example). Link a list of each echo's text conversion given similar data but perhaps different background noise. Or maybe getting data from multiple echos in different homes at the same time allows for "noise" filtering (people asking different things while the same background noise is present).

link

_Codemonkeyism 3464 days ago

Can't vote you up enough.

link