Hacker News new | ask | show | jobs
by thrdbndndn 924 days ago
Could someone elaborate how this is accomplished and if there is any quality disparity compared to original?

Repos like https://github.com/SYSTRAN/faster-whisper makes immediate sense on why it's faster than the original implementation, and lots of others do so by lowering quantization precision etc (and worse results).

but this one, it's not very clear how. Especially considering it's even much faster.

2 comments

The Acknowledgments section on the page that GP shared says it's using BetterTransformer. https://huggingface.co/docs/optimum/bettertransformer/overvi...
From what I can see it is parallel batch processing - default for that repo is 24. You can reduce batches and if you use 1 it's as fast or slow as Whisper. Quality is the exact same (same large model used).