I believe vision-impaired users would greatly prefer their own TTS, in part because many can listen at 5-10x speed since they're so used to hearing that particular voice.
It sounds as though the author has not actually asked vision-impaired users.
And of course audio is not only for vision-impaired users.
I can play audio directly from a webpage at 5-10x speed easily, but I use the command line programs like ffmpeg/mplayer/mpv/vlc, not a browser like Chrome.
Assuming a Google employee's belief reflects what all web users actually prefer (doubtful!), websites could offer a variety of audio with the same TTS voices that screen readers use.
By including the audio file, which only includes the readable text, one can avoid the problem of screen readers trying to read ASCII art, decorative elements, etc.
Some news websites, for example Bloomberg, have been including audio files for years. Wikipedia also offers audio files created with TTS.
The problem is that in this page's case all of the decorative text like borders will be "read out" as well.