|
|
|
|
|
by v7n
598 days ago
|
|
It's not exactly what OP wants out-of-the-box, but if anyone is considering building one I suggest taking a look at this.¹ It is really easy to tinker with, can run both on devide or in a client-server model.
It has the required speech-to-text and text-to-speech endpoints, with multiple options for each built-in. If you can make the LLM AI assistant part of the pipeline to perform translation to a degree you're comfortable with, this could be a solution. ¹ https://github.com/huggingface/speech-to-speech |
|
https://neuml.hashnode.dev/speech-to-speech-rag
https://www.youtube.com/watch?v=tH8QWwkVMKA
One would just need to remove the RAG piece and use a Translation pipeline (https://neuml.github.io/txtai/pipeline/text/translation/). They'd also need to use a Korean TTS model.
Both this and the Hugging Face speech-to-speech projects are Python though.