Hacker News new | ask | show | jobs
by overfeed 241 days ago
> AI also really has trouble with transcribing my speech. I noticed that as early as the '90s with early speech recognition software. It was completely unusable.

I don't know what your transcription use cases are, but you may be able to get an improvement by fine-tuning Whisper. This would require about $4 in training costs[1], and a dataset with 5-10 hours of your labeled (transcribed) speech, which may be the bigger hurdle[2].

1. 2000 steps took me 6 hours on an A100 on Collab, fine-tuning openai/whisper-large-v3 on 12 hours of data. I can shar my notebook/script with you if you'd like.

2. I am working on a PWA that makes it simple for humans to edit initial, automated transcriptions with mistakes for feeding the correct dataset back into the pipeline for fine-tuning, but its not ready yet

3 comments

Any chance you could github your script for public use anyway?

It's an interesting self-contained example

We have a PWA for this at:

https://www.psyome.com/annotator

It is desktop-only - do you have plans to support mobile browsers? My PWA is mobile-first.
FWIW, it might be usable on mobile but I haven't tried it tested it
Oh nice! No plans at the moment
> with 5-10 hours of your labeled (transcribed) speech, which may be the bigger hurdle[2].

Can’t you just read from a known script?