|
It feels like it's the economics holding it back at this point. I cobbled together my own smart home voice assistant on a weekend a few weeks ago, sitting on top of the OpenAI APIs (Whisper, GPT-4), and of course using porcupine for wake word detection. It can do things I could never get the commercial products to do properly, for example I gave it a memory: When a user command comes in, I have GPT-4 evaluate whether it can be executed immediately or requires later follow-up. When a sensor event happens, the machinery re-prompts GPT-4 with the user command backlog, the sensor backlog and the current state, and it figures things out. That way, things like "Please turn of the lights after I leave the room" now work just fine, and all it takes is an afternoon of hacking and a PIR sensor on my little DIY Homebrew-lexa wood boxes. And of course it's also much better at interpreting natural language commands "in spirit" or "creatively". I'm sure Amazon, Google & Apple have made all of these tinkering experiments, too, but deploying LLM-backed voice services to tens of millions just isn't affordable yet, especially when you factor in risk and liability. |
Huge models are running on stock laptops. There is no need to send it to cloud. They had no problem e.g. sound recognition (reacting to alarms, cough, cry etc.) running only on selected devices. IPads have M1/M2 chips. And home assistant model does not need detailed understanding of neuroscience, best haskell patterns etc.
But all transformers development is pretty fresh corpo-wise. I think having a good safe dataset, which is not infringing any copyrights etc. is really hard. And they probably have to be very careful about it since it's not "only" about getting sued, but also potentially damaging partnerships they need for tv/books/music.
Btw have you tried some locally running models instead of GPT-4 for your automation? I don't want my HA touching the Internet unless necessary for 3rd party integrations but GPT-4 sets bar pretty high.