Hacker News new | ask | show | jobs
by mightytravels 924 days ago
Use this Whisper derivative repo instead - one hour of audio gets transcribed within a minute or less on most GPUs - https://github.com/Vaibhavs10/insanely-fast-whisper
2 comments

Anecdotally I've found ctranslate2 to be even faster than insanely-fast-whisper. On an L4, using ctranslate2 with a batch size as low as 4 beats all their benchmarks except the A100 with flash attention 2.

It's a shame faster-whisper never landed batch mode, as I think that's preventing folks from trying ctranslate2 more easily.

Could someone elaborate how this is accomplished and if there is any quality disparity compared to original?

Repos like https://github.com/SYSTRAN/faster-whisper makes immediate sense on why it's faster than the original implementation, and lots of others do so by lowering quantization precision etc (and worse results).

but this one, it's not very clear how. Especially considering it's even much faster.

The Acknowledgments section on the page that GP shared says it's using BetterTransformer. https://huggingface.co/docs/optimum/bettertransformer/overvi...
From what I can see it is parallel batch processing - default for that repo is 24. You can reduce batches and if you use 1 it's as fast or slow as Whisper. Quality is the exact same (same large model used).