Not to step on any toes here (I've starred whisperspeech b/c it really is amazing and I intend to use it), but you should also check out Tortoise [1]. IMO the quality is a little better (for now) but it is painfully slow, even with KV caching it doesn't quite get up to real time on my 4090 except with very short snippets.
1 https://github.com/neonbjb/tortoise-tts