Pretty weird choice of TTS engines. None of them are anywhere near state of the art as far as open TTS system goes. XTTSv2 or the new F5-TTS would have been much better choices.
You can always update the code to use that. Meta releasing stuff on github is not trying to release the "bet" but to give a proof of concept. The licenses of those TTS system matters, it's not enough to be open. If this was a product for their users, they will definitely have better TTS.
"Speech Model experimentation: The TTS model is the limitation of how natural this will sound. This probably be improved with a better pipeline and with the help of someone more knowledgable-PRs are welcome! :)"