It's not immediately clear to me how we'd support this (just from some light searching) - I'm going to keep poking around but if anyone knows more about how to make this work I'd love to hear it :)
Instead of WhatsApp, why not build on Telegram? Telegram’s APIs are built quite well for this type of thing. You can still do it via voice messages [0] or you can even build what Telegram calls a Mini App [1]. And it’s all very straightforward and free!
[0]: https://core.telegram.org/bots/api#voice
[1]: https://core.telegram.org/bots/webapps