| I built RealtimeVoiceChat because I was frustrated with the latency in most voice AI interactions. This is an open-source (MIT license) system designed for real-time, local voice conversations with LLMs. Quick Demo Video (50s): https://www.youtube.com/watch?v=HM_IQuuuPX8 The goal is to get closer to natural conversation speed. It uses audio chunk streaming over WebSockets, RealtimeSTT (based on Whisper), and RealtimeTTS (supporting engines like Coqui XTTSv2/Kokoro) to achieve around 500ms response latency, even when running larger local models like a 24B Mistral fine-tune via Ollama. Key aspects: Designed for local LLMs (Ollama primarily, OpenAI connector included). Interruptible conversation. Smart turn detection to avoid cutting the user off mid-thought. Dockerized setup available for easier dependency management. It requires a decent CUDA-enabled GPU for good performance due to the STT/TTS models. Would love to hear your feedback on the approach, performance, potential optimizations, or any features you think are essential for a good local voice AI experience. The code is here: https://github.com/KoljaB/RealtimeVoiceChat |
2025-05-05 20:53:15,808] [WARNING] [real_accelerator.py:194:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
Error loading model for checkpoint ./models/Lasinya: This op had not been implemented on CPU backend.