| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mightytravels 924 days ago
	Use this Whisper derivative repo instead - one hour of audio gets transcribed within a minute or less on most GPUs - https://github.com/Vaibhavs10/insanely-fast-whisper

2 comments

claytonjy 924 days ago

Anecdotally I've found ctranslate2 to be even faster than insanely-fast-whisper. On an L4, using ctranslate2 with a batch size as low as 4 beats all their benchmarks except the A100 with flash attention 2.

It's a shame faster-whisper never landed batch mode, as I think that's preventing folks from trying ctranslate2 more easily.

link

thrdbndndn 924 days ago

Could someone elaborate how this is accomplished and if there is any quality disparity compared to original?

Repos like https://github.com/SYSTRAN/faster-whisper makes immediate sense on why it's faster than the original implementation, and lots of others do so by lowering quantization precision etc (and worse results).

but this one, it's not very clear how. Especially considering it's even much faster.

link

lern_too_spel 924 days ago

The Acknowledgments section on the page that GP shared says it's using BetterTransformer. https://huggingface.co/docs/optimum/bettertransformer/overvi...

link

mightytravels 924 days ago

From what I can see it is parallel batch processing - default for that repo is 24. You can reduce batches and if you use 1 it's as fast or slow as Whisper. Quality is the exact same (same large model used).

link