Hacker News new | ask | show | jobs
by mrkn1 31 days ago
thank you. You nailed the actual value, that's right. The real win is just knowing you can do this on a laptop CPU, offline, no GPU or cloud bill. There are tiny done-for-you details, like rescaling token timestamps back to real time after the atempo speedup so --timestamps doesn't lie to you, but they are minor.
1 comments

Why the choice of Kroko over something like parakeet-tdt-0.6b-v3, which is also faster than realtime on CPU?
Kroko models are more accurate and their size is just a hundred megabytes compared to parakeet (2.5 gigabytes in default fp32)
Do you have a link to results confirming this? Kroko does not seem to be on the Open ASR Leaderboard. Parakeet has an average WER of 6.32 across several common datasets.
Kroko's website says benchmarks aren't formalized yet. FWIW, this url says 5% WER for English [0]. though it doesn't specify the dataset, so not directly comparable to Parakeet's 6.32 on the Open ASR Leaderboard

Best way to judge is to try it on your own audio

[0] https://huggingface.co/hudaiapa88/sherpa-stt-onnx