Same here. Audio messages are easy to create for the sender but a nightmare to parse for the receiver. Whenever I receive an audio message I automatically tend to assume that the sender thinks of their time as more valuable than the receiver, which is acceptable in some cases (from busy PhD advisor to advisee) but I find unacceptable in other cases, for example in peer-to-peer communication.
A search of "voice messaging culture in asia" surfaces quite a few articles on the subject. But the gist is that vast majority of your every day communication with someone is going to be an exchange of short voice messages rather than text messages, both in work and personal context. This includes planning to meet someone, ordering food, "catching up", discussing a meeting, etc.
Receiving a long text would not be necessarily rude, but unusual.
+1 but the sad part is that this is trivial to fix with Whisper yet I'm not seeing the integrations in popular messaging apps. just put the text blurb in there automatically already!