Yes but that's not records that they actually have. The reason you need a wake word is because that processing is done locally on the device, it's not until you say the wake word that it starts streaming the audio data to the central server for processing.
Its certainly possible for Amazon/Google/Whoever to send your device a firmware update that turns it into an always-on microphone, but it doesn't do that by default
Voice recognition is 'on' all the time as it needs to recognise 'keyword'
Not to say it's not a concern, but that's not really how these devices work – at least in the Alexa case. They're just matching for a specific hotword, rather than constantly performing speech-to-text (which is computationally expensive and done remotely). Think of it more like Shazam or the other audio fingerprinting services – you don't have to actually transcribe the text to understand if a particular word has been heard.
You think it is phoning home with transcriptions 24/7? My understanding is that this is probably inaccurate, and it only phones home when the on-board electronics recognize the wake word.
How many kb of text do you speak per day? It is just gonna be noise compared to when the JS framework of the week is updated. Amazon and Google can hide that.
Its certainly possible for Amazon/Google/Whoever to send your device a firmware update that turns it into an always-on microphone, but it doesn't do that by default