And someone has already converted it to onnx format: https://huggingface.co/eschmidbauer/cohere-transcribe-03-202... - so it can be run on CPU instead of GPU.
This kids make sense because "compiling" (training) the model cost inhibitly much, and we can still benefit from the artifacts.
And someone has already converted it to onnx format: https://huggingface.co/eschmidbauer/cohere-transcribe-03-202... - so it can be run on CPU instead of GPU.