Hacker News new | ask | show | jobs
by kristopolous 259 days ago
there's quite a number of pretty low overhead models around that do that in realtime these days.
2 comments

But how many of them support voice cloning?

(Genuine question; I haven't seen any other than this one.)

microsoft’s vibe voice.
VibeVoice (according to the repo description) is currently unavailable due to "misuse". But my impression was that it required a significant (>8GB) amount of VRAM? Or that it wasn't suitable for on-device for devices with low specs.
its unavailable from their repo, but was released with an open license and mirrors exist. I'm not sure what the VRAM req are.
According to this issue[0] the 1.5B model needs 6GB of VRAM. Meanwhile it looks like NeuTTS is designed to be able to run on CPU, which is nice for older/lower-spec hardware.

0: https://github.com/microsoft/VibeVoice/issues/26#issuecommen...

no