|
|
|
|
|
by noahkay13
108 days ago
|
|
I built a C++ inference engine for NVIDIA's Parakeet speech recognition models using Axiom(https://github.com/Frikallo/axiom) my tensor library. What it does:
- Runs 7 model families: offline transcription (CTC, RNNT, TDT, TDT-CTC), streaming (EOU,
Nemotron), and speaker diarization (Sortformer)
- Word-level timestamps
- Streaming transcription from microphone input
- Speaker diarization detecting up to 4 speakers |
|