| HN Mirror

Real time transcription is not necessarily short snippets. In my experience, initial prompt is useless beyond the first 30 seconds if the words in the initial prompt aren’t used every 30 seconds, including the first 30.

It may be easy to rattle off a list of words, but it doesn’t work nearly as well as it should, so what’s the point? I also never said fine tuning would be easier than prompting. I said it would be better. It would just need to be easier than fine tuning currently is, not easier than prompting.

Fine tuning that I’m talking about would not be limited to only a few new words. You would only need one model, like we have today. It would just be your model that knows all the specific words and spellings you prefer. By analogy to other machine learning models, I would expect a lightweight LoRA approach would also work.

I just haven’t seen anyone working on these solutions that would actually be scalable, unlike the initial prompt.

Initial prompt works in extremely specific scenarios, but it has been so unreliable for long transcripts in my experience that I certainly don’t bother with it anymore. Someone mentioned Alexa-style home assistants, which would have short enough audio snippets that initial prompt would actually be useful.