| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mahmoudfelfel 1343 days ago
	The original model (https://play.ht/blog/introducing-truly-realistic-text-to-spe...) was trained on 50k hours of audio, the above voices were just finetuned on the model, only 4-6 hours each. We just finetuned another voice recently with only 1hr though... I think eventually (soon) we will only need 15-20 mins with zeroshot not even finetuning.