yes. Finetuning a whisper model on a RPi 5 is ~2x faster than on the RPi 4. Other stages involving data pre-processing with HF dataset is again 2x-3x faster.
I’m also interested in peoples’ experience. I’d expect decent performance: Whisper 3 has many model sizes, down to 35Mb, iirc. Training, and especially inference, should be doable on a Pi5.
Nitpick but important - Whisper v2 and v3 are large only. It's actually the same Whisper but the version of the model (large-v2, large-v3) has been updated.
All of the other model sizes are the original release.
I reread your comment multiple times and still don’t understand the important nitpick. Are you saying that the smaller models haven’t been updated alongside the Whisper 3 release? That makes the most sense to me, but I don’t want to misinterpret what you mean!
Yes. The example uses Whisper-tiny which is 39M, a perfect match for the downstream task of keyword spotting. Just one line needs to be changed in the code to run a larger Whisper model :)
I don’t think you’re going to have a good time running the large model on a Pi of any kind.
The large models are 32x slower than the tiny models, roughly.[0]
I just tested, and whisper.cpp on my Pi 4 can transcribe the 30-second a13.wav sample (“make samples” to fetch it) in 18.5 seconds.
You can do the math… 32x = 10 minutes transcribe 30 seconds of audio with the large model. Not a good time for most people.
The Pi 5 could be 2x to 3x faster.
[0]: https://github.com/openai/whisper/blob/main/README.md#availa...