| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by thrdbndndn 924 days ago

Could someone elaborate how this is accomplished and if there is any quality disparity compared to original?

Repos like https://github.com/SYSTRAN/faster-whisper makes immediate sense on why it's faster than the original implementation, and lots of others do so by lowering quantization precision etc (and worse results).

but this one, it's not very clear how. Especially considering it's even much faster.

2 comments

lern_too_spel 924 days ago

The Acknowledgments section on the page that GP shared says it's using BetterTransformer. https://huggingface.co/docs/optimum/bettertransformer/overvi...

link

mightytravels 924 days ago

From what I can see it is parallel batch processing - default for that repo is 24. You can reduce batches and if you use 1 it's as fast or slow as Whisper. Quality is the exact same (same large model used).

link