Hacker News new | ask | show | jobs
by dixie_land 2744 days ago
that makes sense. though I wonder if for training purposes they can simply remove the user identifier. (like anonymizing them for instance)
1 comments

Nothing in the article suggests they were linked to any kind of user identifier, all we know is that each recording is linked to the transcription alexa made of it (which makes sense, training and log wise).

The user were easy to identify because if I listen to the last twenty queries you made to alexa between what you ask, what other people say in the background, what you talk about and what noise is in the background that gives me a lot of information; that's exactly how they were able to identify some of the users from their recordings.

> Nothing in the article suggests they were linked to any kind of user identifier

They were able to provide the customer a bundle of his recordings upon request so they must've maintained such a lookup mapping.

Of course that led to the mix up reported here. But I'd argue it's safer if on Amazons side they simply can't fulfill such request themselves due to anonymization.

I suppose that does raise a question of where you draw the line on a company's obligation to anonymize information. Is disassociating a user ID from the data enough? What about data that makes the user identifiable through patterns?

If I say "Alexa, my name is Bob Jones and I have chlamydia," they're not really helping me out by just not associating that audio clip with my username.