| It would cost about $30m a year if you tailor the system to flagging specific data for storage and don't naively store every moment (e.g. you scrap silent moments and use VBR encoding). Storing a year's worth of 96kbps audio costs 380GB. If you don't record silence and you assume the people around an Alexa are only speaking for at most 4 hours a day on average, that goes down to 76GB a year. So if you then assume 5m Alexa's are active at any given point in time that works out to 380k PB. Ok, that doesn't work yet. However, if you then layer on a flagging system, where only certain users' full record is stored, or only "suspicious incidents" are stored, and you get this down to only flagging 0.1% of all data, you arrive at 380PB of storage. Amazon Glacier costs about $88.000 a year per PB, but there's a profit margin included in that, so I'll assume it costs Amazon just $75k a year. In conclusion, it would cost Amazon about $28.5m a year to run such a system. That's certainly within the realm of possibility and of what LE/SIGINT clients would pay; I assume the NSA would gladly pay that sum x100 for that capability. Sounds like it'd be booming business for Amazon. |
It is also the case that a consumer level service like glacier presumably has more redundancy than what might be needed for best-effort storage of these recordings, where losing any fraction of them wouldn't really be a problem.