|
|
|
|
|
by nikvaes
1356 days ago
|
|
The problem for Jigasi's speech-to-text feature with Whisper - or any recent SOTA speech-to-text neural networks, is that they are transformer-based. One of the key features of transformers is that they are very good at processing a sequence with the attention mechanism. But attention inherently needs to see the whole input sequence. So it's difficult to adapt these architectures to perform well in real-time scenarios like captioning meetings. |
|
On that topic — they record sessions in an interesting way, basically an instance of chrome is started and captured... I think with OBS. That always made me raise an eye but I also can’t think of up a better way.
edit: It's actually jibri which has to do with recording. Gosh I wish the names were a liiiittle more intuitive. :)