Okay, but have you used the large Whisper model? Sure, voice typing has been around for 10 or 20 years. And it's great if you have a good mic and enunciate, but these new models are insane. You can just mumble something from across an entire room, with peanut butter in your mouth, and it won't miss a single word.
Yeah you get "Like and Subscribe!" or "Thank you." or even chinese back from the API if you send pure silence (or I guess it's white noise to the model once its volume normalized). I think humans hallucinate in white noise or sensory deprivation too, maybe it's related.