| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mazoza 596 days ago
	I dont actually see any tokens used in the model. It seems like the model actually predicts latents and then VAE converts back to audio. More like Tortoise or XTTS