Hacker News new | ask | show | jobs
by tclancy 3464 days ago
I really like the math here, but isn't this a bit pointless? The system wants to parse meaning from audio; storing just the text it parsed is a lot smaller. Store just the text and whatever machine learning score of how probable the text is correctly parsed and that sounds like something prosecutors would love to bring into court: "Please read this line and let's see what score you get . . . "
3 comments

For improvements they'd store the raw input so that when a mistake happens they can manually try to figure out why the machine got it wrong (e.g. a hi-hat was hit while they were saying "deuce" so it sounded like "douche").
The raw speech would still be very useful as training data for new and improved models
It could also store compressed voice waveforms in such a way that any reproduction from the compressed data would sound horrible but would be at least somewhat intelligible to human listeners.

1200 bits per second is almost enough for toll-quality speech -- and I'm referring to the state of the art a few years ago. Speech codecs are probably better now. But let's stick with 1200 bps. That's enough to store continuous speech in the vicinity of the device for a year, using only about 5 GB.

My guess is that if you cared only about intelligibility and not fidelity, you could do the job with 10%-20% of that space.

So yes: Alexa could easily be collecting and storing a vast amount of data that isn't immediately transmitted or used.