Text to Speech from voices with ~15 minutes of Audio from YouTube videos

Using transfer learning on top of a DCTTS model (deep convolutional text to speech), I wanted to see how quickly one could recreate voices remotely convincingly.

TLDR/W, using ~15 minutes of audio and about 1.5 hours of training, I was able to create what I think are pretty good examples of voices of myself, Donald Trump, Obama, Musk, and Joe Rogan.

None are perfect, and very much still a work in progress, but maybe something you might want to note that exists now (and has for years).

Even if you don't post videos of yourself on YouTube, your audio is almost certainly stored, tagged by your name, by Google (Assistant), Apple (Siri), Amazon (Alexa), and probably many other providers.