There are only two audio transcription models. Is this generally true, are there no open source ones like llama but for transcribing? Or just small dataset on that site
It looks like the site is only listing hosted models from major providers, not all models available on huggingface, civit.ai, etc. -- Looking at the image generation and chat lists there are many more models that are on huggingface that are not listed.
Note: Text to Speech and Audio Transcription/Automatic Speech Recognition models can be trained on the same data. They currently require training separately as the models are structured differently. One of the challenges is training time as the data can run into the hundreds of hours of audio.
There are lots and lots of models, covering various use cases (e.g., on device, streaming/low-latency, specific languages). People somehow think OpenAI invented audio transcription with whisper in 2022 when other models exist and have been used in production for decades (whisper is the only one listed on that website).
See https://huggingface.co/models?pipeline_tag=automatic-speech-...
Note: Text to Speech and Audio Transcription/Automatic Speech Recognition models can be trained on the same data. They currently require training separately as the models are structured differently. One of the challenges is training time as the data can run into the hundreds of hours of audio.