Hacker News new | ask | show | jobs
by jilijeanlouis 963 days ago
Thanks for mentioning Gladia, this is not exactly how it works however, our version of Whisper is modified from the original one to avoid hallucinations, we are releasing a new model in a few days that is even better regarding this matter. Also worth mentioning 3 main problems that occurs when it comes to real-time - endpointing, context reinjection (while avoiding hallucinations - which is a main issue with whisper as prompt injection is generating hallucinations like a lot in general), and finally alignment. Timestamps are extremely important in real time if you want to realign with the original streamed audio. Whisper tends to be hard to handle in all these elements.