Hacker News new | ask | show | jobs
by zachthewf 873 days ago
Hard for me to imagine that this could be solved in text space. I think the prediction task needs to be done on the audio.
1 comments

We thought about doing this in Whisper itself, since its already working in the audio space.