|
|
|
|
|
by didntcheck
1095 days ago
|
|
I imagine that part of the problem is that so far LLMs have primarily been "language in; language out". They are almost always used with the input and output wired up to a human, whereas for voice control (I.e. not just answering questions) you need it to produce structured output for the computer behind it to understand. There certainly has been some success in that area, like the article about JSON from GPT4 posted yesterday [1] I'm quite certain that Amazon and the others will be putting a lot of time into investigating this avenue, it's just not ready for rollout yet. I'm looking forward to it - I have an Echo and it definitely does use a lot of fuzzy matching to understand basic rewording of queries, but it's still very much a limit set of defined "functions", just with a fuzzy call site. The user has to handle the breaking down of their high-level intent into small requests from those primitives, and keep all the "variables" in their head and pass them to each "function call". Smarter assistants that can remember context is something that's been worked on and allegedly deployed almost since the start, but ChatGPT is the only thing I've seen that lives up to the promise in the real world [1] https://news.ycombinator.com/item?id=36330972 |
|