Agreed. I recently built an internal application allowing our customer reps to play around with ideas using text-to-speech before sending the "copy" to a studio for a professional human recording, and included both Google WaveNet and Amazon Polly in the available voice synthesis choices. Polly is in its own right plain and simply mediocre for the most part, and in comparison to WaveNet it's just awful.
I've tried both of them and even Microsoft Neural speech and IBM's ones; eventually, Microsoft one has sounded me the most clear and natural amongst these four services.
100% agree. Azure voices are the best. I wish Polly would catch up since most of our stacks are there but we keep going back to azure for this one specific thing.