A robotic, explicitly non-natural voice would be perfectly acceptable, and even desirable, in many situations[...]it'd be enough is the speech if easy to recognize.
We've had formant synths for several decades, and they're perfectly understandable and require a tiny amount of computing power, but people tend not to want to listen to them:
DECtalk[1,2] would be a much better example, that's as formant as you get.
[1] https://en.wikipedia.org/wiki/DECtalk [2] https://webspeak.terminal.ink