There's a personal taste element: I agree with you that certain WaveNet voices sound better (I've actually used them for video narration with some success). The breathing caught me off guard: it took me a minute to identify THAT as the element that was there but I implicitly wasn't expecting to hear.
The breathing + pausing at commas/full stops and general cadence was frankly superior to what I've seen with Google Cloud Voice, which is why I was curious if preprocessing was done. I've generally had to do multiple manual passes with Google Cloud Voice to get audio output that didn't sound robotic.
The breathing + pausing at commas/full stops and general cadence was frankly superior to what I've seen with Google Cloud Voice, which is why I was curious if preprocessing was done. I've generally had to do multiple manual passes with Google Cloud Voice to get audio output that didn't sound robotic.