|
|
|
|
|
by regularfry
967 days ago
|
|
Oh yes, that's absolutely true - faster is better for everyone. It's just that this particular breakpoint would put realtime transcription on a $17 device with an amazing support ecosystem. It's wild. That being said, even with this distillation there's still the aspect that Whisper isn't really designed for streaming. It's fairly simplistic and always deals with 30 second windows. I was expecting there to have been some sort of useful transform you could do to the model to avoid quite so much reprocessing per frame, but other than https://github.com/mit-han-lab/streaming-llm (which I'm not even sure directly helps) I haven't noticed anything out there. |
|