Hacker News new | ask | show | jobs
by modeless 883 days ago
I agree, I just think these new projects based on LLMs should be considered a different category, AI agents or something. The traditional grammar-based voice assistant architecture used by Google Assistant, Siri, Alexa, Bixby, etc is an abject failure, and not for lack of trying.
2 comments

If the Alexa/Echo-category of devices are meant to be some widely flexible and pervasively useful device from Star Trek, they're a failure.

If they're meant to be a usable $30 kitchen timer and music player, they're pretty great.

Google Voice Search used internal codename "Majel" in reference to Majel Barrett, who played the voice of the Star Trek computer. That was explicitly the ambition. It just didn't work out.
OK. I believe you. They accidentally created something else that was very useful but different from what they set out to do.

Is that a story of failure or of success?

Starbucks launched to sell beans and espresso machines. YouTube launched as a video dating site. Are they also failures?

I guess where we disagree is "very useful". If Google Assistant stopped working tomorrow I would hardly care at all. There are a couple of scenarios where it's slightly more convenient than using my phone (assuming I don't encounter one of its many failure modes) and that's about it. I'm sure the hands free aspect is important for certain people in certain situations but I think the vast majority of people just don't see a lot of value from it.
Amazon alone has sold over half-a-billion Alexa-enabled units (around 10 of them to me).

I think people see more than $30 of value in them, at least as their revealed preferences suggest.

Those sales were subsidized in expectation of future profitability that will never come (at least not without a ground-up redesign of the product around LLM-based AI agents or some other paradigm). Economically Alexa is a "colossal failure": https://arstechnica.com/gadgets/2022/11/amazon-alexa-is-a-co...
Yeah, my crock pot and ceiling fan are "Alexa-enabled". That's two.
But how often did that computer voice-acted by her really do significantly more than "set timer to thirty minutes"? Outside of some broken plots on the original series ("we have insufficient data to know the truth, let's ask the computer who will tell us anyways!"), it really was mostly mundane voice assistant stuff.

(I'm deliberately excluding the "ten words to 'author' a holodeck scene" part, that had always been stretching my imagination a little too far, more "this can't work!" than space travel and transporter beams. Then stable diffusion happened)

There were some scenes where Riker on the bridge asked the computer essentially a SQL query: "give me a list of star systems with parameters that fit X and cross-reference by Y..." "There are 3 systems which fit your query: ..."
That might have been the initial hope of the team, before Google killed it. It's been on the graveyard for years with zero updates, my google assistance nest mini is arguably worse than when I bought it.

I believe it is possible to have made it better, but they didn't try, they just gave up, like much of google products.

I think way too much money has been sunk into those projects if their ambition is just to be a $30 kitchen timer and music player.
Yup, as far as something that can turn on a light, run a timer, convert some units, tell me the weather, and have an -okay- shot at some categories of random questions instead of me getting out my computer-- the google assistant is just fine.
They do succeed in that way many times a day every day in my house
As '80s telescreens they were fantastic.
I think many parts of the architecture can be reused - in Alexa terms, all of the “skills” that integrate the assistant with various other services. IMO one of the main problems with assistants is that I don’t know what skills are available or how to invoke them. It’s like I’m a wizard who has to memorize all the spells I could be casting. It never happens because I don’t care enough. I think LLM’s could potentially help my making it easier to discover and invoke those skills.
This "spells" is such a great way to explain how it feels to me to use these assistants. I'll play with one if I'm at a friend's house, but honestly can't see the appeal. Telling Google to change the color of the lighting or brightness just seems like something that is mostly a gimmick unless you're maybe disabled and then it may be a big quality of life improvement. The other stuff doubly so.

With ChatGPT I can see the appeal for certain tasks like having it create a custom text adventure for you, but I can't see it being too useful in my day to day life yet.

"Skills" will be obsolete very soon. AI agents will use the same software tools and services that humans do. They won't need special separate AI-only interfaces.

I'm not excited about the Rabbit R1 as a hardware device but their software vision is exactly right and there are new startups coming out of stealth seemingly every day now attacking this problem.

Skills are just APIs that conform to a similar look. We'll definitely continue to have AI-only or developed-for-AI APIs for future "agents" to act against. They probably won't spend much effort formatting text to sound good to a person, but the infrastructure is here.
I disagree. These special APIs will not have the breadth of capabilities that the human UI does, so AIs will use the human UIs out of necessity. But I think in the long term we will eventually see a simplification of UIs. As it becomes less common for humans to actually use them, they will no longer need fancy animations or dark mode or client-side validation or pretty styling. In the extreme, a return to plain HTML forms that a human can use in a pinch but are mostly used by AI agents. At that point I guess you're blurring the lines between UI and API.
Isn't it the exact opposite? Interfaces we use every day can be dead simple, all they need is that they don't change behind our back. The accelerator pedal does not come with a footover pop-up "keep pressed to make car go". Interfaces we use once in a leap year on the other hand, that's where we need all the hand-holding we can get.