Hacker News new | ask | show | jobs
by jallenjia 524 days ago
I'm excited to share Kokoro TTS, an open-source text-to-speech model we've been working on. Despite its relatively small size (82M parameters), it achieves impressive results in natural speech synthesis, ranking first in the TTS Spaces Arena benchmark.

The model is Apache 2.0 licensed and trained on less than 100 hours of audio data. It supports both American and British English, offering multiple voice options with natural emotional expression and 24kHz audio output.

We've deployed a demo at kokorotts.online where you can try it out. I'd really appreciate any feedback from the HN community on both the model's performance and potential applications.

Tech stack: StyleTTS 2 architecture, ONNX runtime, Next.js for the web interface.

3 comments

It's NOT Open Source.
Confusing messaging, a previous version is: https://huggingface.co/hexgrad/Kokoro-82M (matching the demo if you use the "TTS v0.19" tab, it has some artefacts in the voice[1] and definitely doesn't sound as good as the latest version).

"There currently isn't a release date scheduled for the other voices"

[1]: https://huggingface.co/blog/hexgrad/kokoro-short-burst-upgra...

And it's not offline.
In which sense? https://huggingface.co/hexgrad/Kokoro-82M

- Apache 2.0 weights in this repository

- MIT inference code in spaces/hexgrad/Kokoro-TTS adapted from yl4579/StyleTTS2

- GPLv3 dependency in espeak-ng

That's not the model repository advertised in the post.
The website is not from the authors. Seems fraudulent

HF: https://huggingface.co/spaces/hexgrad/Kokoro-TTS

Where is the code?