| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by stephenheron 460 days ago
	Quite disappointing their speech to text models are not open source. Whisper was really good and it was great it was open to play around with. I guess this continues OpenAI's approach of not really being open!

1 comments

nickthegreek 460 days ago

Indeed. Right now I think our open choices are Piper, Kokoro and Orpheus.

link

GaggiX 460 days ago

He was talking about STT models, not TTS. Whisper is open source and a good solution in many cases (in particular finetuned ones).

link

pzo 460 days ago

regarding STT we got also today 2 new models from Nvidia:

https://huggingface.co/nvidia/canary-180m-flash

https://huggingface.co/nvidia/canary-1b-flash

second in Open ASR leaderboard https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

Sadly only supports 4 languages (english, german, spanish, french)

link

DrPhish 460 days ago

In my opinion GPT-SoVITS is the best if you can put in the effort. I'm still using v2 since the output is so good. Its also the best multilingual one in my testing on Japanese inputs.

link

nickthegreek 460 days ago

hadnt messed with that one before. my needs are more real time for voice assistant but was neat to play with on hugginface.

https://huggingface.co/spaces/lj1995/GPT-SoVITS-v2

link

pzo 460 days ago

can it support more languages rather than only English, Chinese, Japanese, Korean?

link