|
|
|
|
|
by alias_neo
542 days ago
|
|
All of these components are available separately or as add-ons for Home Assistant. I currently do STT with heywillow[0] and an S3-Box-3 which uses an LLM running on a server I have to do incredibly fast, incredibly accurate STT. It uses Coqui XTTS for TTS, with very high quality LLM based voice; you can also clone a voice by supplying it with a few seconds of audio (I tested cloning my own with frightening results). Playback to a decent speaker can be done in a bunch of ways; I wrote a shim that captures the TTS request to Coqui and forwards it to a Pi based speaker I built, running MPD which then requests the audio from the STT server (Coqui) and plays it back on my higher quality speaker than the crappy ones built in to the voice-input devices. If you just want to use what's available HA, there's all of the Wyoming stuff, openWakeword (not necessary if you're using this new Voice PE because it does on-device wakeword), Piper for TTS, or MaryTTS (or others) and Whisper (faster-whisper) for STT, or hook in something else you want to use. You can additionally use the Ollama integration to hook it into an Ollama model running on higher end hardware for proper LLM based reasoning. [0]heywillow.io |
|