Basic video is good enough for a lot of purposes. But with minimal gear and software, I can clean up most speakers with very little work in audio (and there are better AI cleansing tools these days as well). For a given quality level, the bar is much lower for audio only.
This really isn't true. Mixing/mastering if you want to target:
* in ear devices
* vehicle audio systems
* phone speakers
* laptops
* mid-range home stereo systems
* high end home stereo/studio monitoring
is quite complex to get right, and generally you can't optimize for more than one at a time. That's even more so if you actually buy into the "immersive audio" hype, where playback is not even stereo anymore.
Audio can certainly get complex. But per the upthread query I'd argue that it's still easier to get understandable audio in an interview in a quiet location than it is to shoot video, especially outside of a studio setting.
and yet ... if the video quality is sub-par people care <--- this much --->, whereas if the audio is sub-par people care <----------- this much ------------->