|
|
|
|
|
by will-burner
575 days ago
|
|
Is there any reason why this would work better or is needed compared to taking audio and 1. doing ASR with whisper for instance 2. applying an NER model to the transcribed text? There are open source NER models that can identify any specified entity type (https://universal-ner.github.io/, https://github.com/urchade/GLiNER). I don't see why this WhisperNER approach would be any better than doing ASR with whisper and then applying one of these NER models. |
|