Hacker News new | ask | show | jobs
by janalsncm 364 days ago
> I wonder if there's a way to automatically detect how "fast" a person talks in an audio file

Transcribe it locally using whisper and output tokens/sec?

1 comments

Just count syllables per second by doing an FFT plus some basic analysis.
> FFT plus some basic analysis

Yeah, totally easier than `len(transcribe(a))/len(a)`

Maybe not as quick to code up but way faster to calculate.

The tokens/second can be used as ground truth labels for a fft->small neural net model.