| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nikvaes 1356 days ago
	The problem for Jigasi's speech-to-text feature with Whisper - or any recent SOTA speech-to-text neural networks, is that they are transformer-based. One of the key features of transformers is that they are very good at processing a sequence with the attention mechanism. But attention inherently needs to see the whole input sequence. So it's difficult to adapt these architectures to perform well in real-time scenarios like captioning meetings.

1 comments

pen2l 1356 days ago

Yes! But a part of the Jitsi ecosystem enables recordings and whisper is a good candidate to use for these recorded sessions.

On that topic — they record sessions in an interesting way, basically an instance of chrome is started and captured... I think with OBS. That always made me raise an eye but I also can’t think of up a better way.

edit: It's actually jibri which has to do with recording. Gosh I wish the names were a liiiittle more intuitive. :)

link