| HN Mirror

Its not fast without pre-segmentation as they do in WhisperX. It actually has terrible transcription speed. For speedup we have to use Ctranslate2 kernels. The decoding code is also a mess where its hard to plug your own custom language model. Not to mention streaming ASR requires even more tweaks. Whisper Small is very fast and quite inaccurate. If you deploy whisper on a GPU which costs around dollar per hr, you really to ensure that the cost savings are worth it.

Although all of this is from a production lens. For personal use, honestly nothing is as easy to use as Whisper (even works on a laptop).