| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gagan2020 51 days ago
	It is not good for text to speech (TTS) as well. I am trying it for few days. First of all 1.5B model documentation is not there. 0.5B realtime is shit model. I was converting text, line by line and it was randomly adding music and couldn't handle special characters like "…". I really disappointed with this model to say the least.

3 comments

SyneRyder 49 days ago

> ...it was randomly adding music...

I've been noticing this with the Mistral Voxtral TTS models too. I have my AI record a morning briefing podcast for myself, and occasionally there are sounds like music at the start (the british voice had a musical tone underneath that sounded a little like the end of the BBC News theme). I don't think I've ever encountered that with the OpenAI TTS models, so they're now my default go-to again.

link

tjungblut 51 days ago

yep, it seems this was trained on large amount of podcasts with ad jingles or phone call queues with elevator music. I was also pretty disappointed to run the TTS last week.

link

Stagnant 51 days ago

The 7B parameter Vibevoice TTS model is still the most impressive local TTS model i've tried. It was pulled by Microsoft a few days after its release due to "abuse potential" but it can be found in various community maintained huggingface repos.

link