|
|
|
|
|
by ipotapov
39 days ago
|
|
Interesting that you use WhisperKit for local transcription. We built something comparable in speech-swift (which I maintain), focusing on on-device ASR with Qwen3-ASR, which supports 52 languages and achieves an RTF of 0.06 on Apple Silicon. The tradeoff is full native Swift async integration. https://github.com/soniqo/speech-swift |
|
What we found was that for super fast tap to speak and paste text, WhisperKit is already close to instant (basically realtime for Apple Silicon). Faster than realtime is mostly only useful for batch processing of audio which is not really our product.