| thank you for building an amazing product! I suspect cloning OpenAI's API is done for compatibility reasons. most AI-based software already support the GPT-4 API, and OpenAI's official client allows you to override the base URL very easily. a local LLM API is unlikely to be anywhere near as popular, greatly limiting the use cases of such a setup. a great example is what I did, which would be much more difficult without the ability to run a replica of OpenAI's API. I will have to admit, I don't know much about LLM internals (and certainly do not understand the math behind transformers) and probably couldn't say much about your second point. I really wish HomeAssistant allowed streaming the response to Piper instead of having to have the whole response ready at once. I think this would make LLM integration much more performant, especially on consumer-grade hardware like mine. right now, after I finish talking to Whisper, it takes about 8 seconds before I start hearing GlaDOS and the majority of the time is spent waiting for the language model to respond. I tried to implement it myself and simply create a pull request, but I realized I am not very familiar with the HomeAssistant codebase and didn't know where to start such an implementation. I'll probably take a better look when I have more time on my hands. |
Some of the example responses are very long for the typical home automation usecase which would compound the problem. Ample room for GladOS to be sassy but at 8s just too tardy to be usable.
A different approach might be to use the LLM to produce a set of GladOS-like responses upfront and pick from them instead of always letting the LLM respond with something new. On top of that add a cache that will store .wav files after Piper synthesized them the first time. A cache is how e.g. Mycroft AI does it. Not sure how easy it will be to add on your setup though.