I'm not too sure about that. From my testing, Fluttershy, Applejack, Twilight, Chrysalis, Rise, and Kyu (and a bunch of other characters that I'm surely forgetting) seem to perform phenomenally well. Especially Chrysalis, her emotions are extremely believable, and Fluttershy/Applejack/Rise/Kyu have almost zero noise for every generation. This might be the most impressive site I've ever seen.
Oh, I somehow forgot all of the TF2 characters. Some of them do struggle (Medic the most, I think) but everyone else seems incredibly good.
And the Daria characters, too. Honestly, the vast majority of characters are already near-perfect.
Hrm. Well, I can't really argue with that beyond that my standards on perfect might be different.
I think some of the best voices they have are characters like Twilight, she shows a ton of promise. But as it stands right now, I would still at least hesitate to use Twilight's voice in a project unless I didn't have other options. Chrysalis's voice is good, but again, is an exaggerated cartoon character with a large amount of inflection. I would not use her voice in her current state without a lot of post-processing. Someone like the Spy I would consider to be unusable, it sounds to me like the character needs to clear their throat or something, it's got a lot of strange artifacts. I definitely would consider the 10th Doctor unusable, even for just a hobby project or a voice assistant.
But... I don't know, maybe this is subjective. I can't just tell you that what you're hearing is wrong, if you like the results then you like the results :)
And again, I don't want to detract from how impressive they are. They are incredibly impressive, particularly because of how characters like Chrysalis emote. Extremely promising. But I still think there's a difference between 'impressive' and 'believable deepfake'.
Yeah, that's fair. I dunno, I can't really hear anything wrong with Fluttershy or Applejack no matter how hard I try, but your ears are probably much better than mine :p
It seems to struggle more and more as the voices get less cartoony/exaggerated.