In fact that's just 40gb per year, which is pretty doable even on a local SD card that Alexa could be fitted with. Or even if it stores one month and deletes the oldest one every day, that's still 3gb. Very doable.
It is also feasible the device could just wait and transmit only voice like data and drop other sound data... (Even my crummy baby monitor can detect the difference.)
Speaking of which I wonder what the net traffic usage of the Echo is?
Especially since Alexa already converts it to text! It wouldn't be outside the realms of possibility.
Would it be possible to test this? Check the battery life of the Dot in a completely silent room vs the battery life of a Dot listening to an audiobook played on repeat. If it is actually listening and transcribing it should have a higher power consumption and thus die faster - right?
For NLP research, you'd want something that preserved more information than text.
Questions for anyone in the field: how much is preserved? Is there a < audio but > text form that allows for iterative testing? Maybe the output of a first-pass pheneme decoder? If so, what kind of space requirements?
> Speech can be encoded with less than 10Kbps [1], which means a maximum of 108MB of data per day.
People only speak a few hours per day and "interesting" conversations could be sampled from time to time and some Alexa stations flagged for full upload, if they want to know.