Hacker News new | ask | show | jobs
by neverokay 753 days ago
I’d add that I had better luck using smaller chunks (about 20 seconds) per wav file for accuracy. Whisper seems to go berserk if you pump in lengthy audio (30+ seconds).

I’d be tempted to at least try breaking down the notes into one line long images (about a sentence) each and give it ago with Gemini. I haven’t tested their ocr, but even if it has errors, I bet you could just ask Gemini again to best fix the sentence.

1 comments

Whisper works on 30s chunks iirc. You need to use something that's automatically splitting up your input if it's longer.