It will be interesting to see if these made-up AI voices can deliver jokes with the same tonality and delivery as good comedians can. I'm just a layman but it feels like a hard problem to solve.
The furthest right column in the first table shows that they might be a long way off from getting timing right. The 5-second sample happens to have a comma, at which the speaker pauses; this pause is in most of the generated output, at seemingly random places in the sentence. The one sentence that does have a comma doesn't use the pause, either.