Hacker News new | ask | show | jobs
by rpdaiml 41 days ago
How are you handling the on device speech pipeline, especially around model size, latency, and accuracy tradeoffs on consumer hardware?
1 comments

Currently the on device models such as Parakeet and Whisper are great for English, faster than cloud hosted models a little less accurate - if you switch on the post processing, the ASR output goes through a fine tuned Qwen 3.5 model that improves the accuracy, formatting etc - all of the code is open source feel free to inspect and suggest perf improvements as a PR!