I've had this same thought and even direct comment to friends. The answer I've gotten basically every time has been that they send audios because they're driving while they're responding.
And most phones, even after 12+ years of this feature, badly mangle spoken text when transcribing and require editing, while the audio stays intact when sent.
So I think there's something more going on. I think maybe it's that text was the "default" communication medium, and speech is becoming the default.