| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jallenjia 524 days ago

I'm excited to share Kokoro TTS, an open-source text-to-speech model we've been working on. Despite its relatively small size (82M parameters), it achieves impressive results in natural speech synthesis, ranking first in the TTS Spaces Arena benchmark.

The model is Apache 2.0 licensed and trained on less than 100 hours of audio data. It supports both American and British English, offering multiple voice options with natural emotional expression and 24kHz audio output.

We've deployed a demo at kokorotts.online where you can try it out. I'd really appreciate any feedback from the HN community on both the model's performance and potential applications.

Tech stack: StyleTTS 2 architecture, ONNX runtime, Next.js for the web interface.

3 comments

kissgyorgy 524 days ago

It's NOT Open Source.

link

dontdoxxme 524 days ago

Confusing messaging, a previous version is: https://huggingface.co/hexgrad/Kokoro-82M (matching the demo if you use the "TTS v0.19" tab, it has some artefacts in the voice[1] and definitely doesn't sound as good as the latest version).

"There currently isn't a release date scheduled for the other voices"

[1]: https://huggingface.co/blog/hexgrad/kokoro-short-burst-upgra...

link

vanous 524 days ago

And it's not offline.

link

CGamesPlay 524 days ago

In which sense? https://huggingface.co/hexgrad/Kokoro-82M

- Apache 2.0 weights in this repository

- MIT inference code in spaces/hexgrad/Kokoro-TTS adapted from yl4579/StyleTTS2

- GPLv3 dependency in espeak-ng

link