| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by overfeed 288 days ago

> AI also really has trouble with transcribing my speech. I noticed that as early as the '90s with early speech recognition software. It was completely unusable.

I don't know what your transcription use cases are, but you may be able to get an improvement by fine-tuning Whisper. This would require about $4 in training costs[1], and a dataset with 5-10 hours of your labeled (transcribed) speech, which may be the bigger hurdle[2].

1. 2000 steps took me 6 hours on an A100 on Collab, fine-tuning openai/whisper-large-v3 on 12 hours of data. I can shar my notebook/script with you if you'd like.

2. I am working on a PWA that makes it simple for humans to edit initial, automated transcriptions with mistakes for feeding the correct dataset back into the pipeline for fine-tuning, but its not ready yet

3 comments

mjburgess 288 days ago

Any chance you could github your script for public use anyway?

It's an interesting self-contained example

link

erikerikson 288 days ago

We have a PWA for this at:

https://www.psyome.com/annotator

link

overfeed 287 days ago

It is desktop-only - do you have plans to support mobile browsers? My PWA is mobile-first.

link