|
|
|
|
|
by gronky_
437 days ago
|
|
I just tried the demo on the homepage and I don’t know what kind of sorcery this is but it’s blowing my mind. I input a bunch of completely made up words (Quastral Syncing, Zarnix Meshing, HIBAX, Bilxer) and used them in a sentence and the model zero-shotted perfect speech recognition! It’s so counterintuitive for me that this would work. I would have bet that you have to provide at least one audio sample in order for the model to recognize a word it was never trained on. Providing it to the model in text modality and it being able to recognize it in the audio modality must be an emergent property. |
|