Hacker News new | ask | show | jobs
by nopelynopington 263 days ago
If this lives up to the demo it's a huge development for anyone looking to do realistic tts without paying to use an API
1 comments

there's quite a number of pretty low overhead models around that do that in realtime these days.
But how many of them support voice cloning?

(Genuine question; I haven't seen any other than this one.)

microsoft’s vibe voice.
VibeVoice (according to the repo description) is currently unavailable due to "misuse". But my impression was that it required a significant (>8GB) amount of VRAM? Or that it wasn't suitable for on-device for devices with low specs.
its unavailable from their repo, but was released with an open license and mirrors exist. I'm not sure what the VRAM req are.
According to this issue[0] the 1.5B model needs 6GB of VRAM. Meanwhile it looks like NeuTTS is designed to be able to run on CPU, which is nice for older/lower-spec hardware.

0: https://github.com/microsoft/VibeVoice/issues/26#issuecommen...

no