Hacker News new | ask | show | jobs
by rexreed 1309 days ago
The voice assistant space is dying: https://arstechnica.com/gadgets/2022/11/amazon-alexa-is-a-co...

Some of the comments below are part of the explanation why. It doesn't work as well as people were hoping, and it's a solution in search of a problem with limited application and it seems little monetization. The above article sums it up better from the big tech company's perspective.

13 comments

Voice assistants which are trying to force engagement to squeeze money out of you are dying.

Most people only use the voice assistants for a few simple tasks, which is perfect for an open source project like mycroft. It is, however, very, very bad for Amazon and Google, because those tasks don't make them money. That's why they're all going so aggressive on "you asked for the time, but by the way here's a 5 minute speech on all the easily monetizable tasks I can do instead"

People like the idea of voice assistants, but by and large they don't like all the problems associated with a voice assistant run by Amazon, Google, and Microsoft.

Yup - I think this is the truth. I'm willing to spend several hundred dollars right this second for a simple voice assistant for things like weather, time, timers, unit conversion, alarms, and home assistant control (mainly lights).

I've actually pre-ordered the Mycroft Mark 2, although no chance to evaluate it yet.

I'm very interested in devices that can do this locally.

I'm not interested in Alexa/Google home anymore AT ALL - I've gone that route, they both work, but they want my dollars all the time, and it's become increasingly clear that if they can't get me making purchases through those devices - they will kill them off, or become ever more scummy in the attempt (Alexa is now including ads in the "did you know" section - "did you know" was already a fucking terrible decision to include, since it's going to marginally increase interaction at the expense of huge user dissatisfaction. But putting ads there has made me leave.)

So basically - I think if anything, we're seeing a speed run of 90s/2000s tech company boom/bust. A huge amount of money poured into the space with no real idea of how to sustainably profit, but the space itself doesn't feel like it's going anywhere.

It's really, really compelling to allow voice control in all sorts of interactions - but it needs to be very clearly working with me, and not trying to subvert my intent for profit. That might even mean it needs to fall back to something like "if this command, then that action" style usage. No more changing commands, no more bullshit ads, no more subversion of what I'm asking it to do.

It needs to obey me, not google or amazon. Otherwise it's a sales rep and not a digital assistant.

>they both work,

I got an echo (alexa) for free and use it for home assistant. It only works when I have an internet connection. So when my internet is out, I cannot turn my lights on/off with it. I understand why, but i too would REALLY like to just have all functionality dependencies for home automation to be local.

I use Mycroft with the home assistant vm running on Proxmox. I’m surprised how easily they integrate. And when the internet goes out I can locally control things from a laptop.
> And when the internet goes out I can locally control things from a laptop.

While an improvement I don't want a cloud-dependent system except where it's unavoidable: like having it read me news.

I'll have to check out the specifics - but your full setup does not seem like an improvement for me. If alexa is down, I can open up the app on my phone for each smart device manufacturer and manually control things that way. They only need my local wifi network to be functional, internet access not needed.

Can a raspberry pi handle the server functionality I wonder?

I worked on a project in 2016 that told me all I needed to know about the space. It was an online voice assistant and I couldn't find myself wanting to interact with it. Even though I spent a lot of time on the project, I scrapped it, because it was just lame. It looked kind of cool, but was lame.

I personally don't think there is enough cybernetica to control with voice. At some point there may be, but right now, the internet is just one giant consumption stream with a few searches and purchases now and then.

That digital daemon experience taught me that I care more about physical intelligence than verbal intelligence when it comes to my technology. I'm verbally intelligent myself, I don't need an AI who can't even speak correctly, let alone understand me, be my verbal interface to the world.

Honestly - I don't think anyone really wants an "online voice assistant". The key problem word there is "online".

There is no way that a device can meaningfully parse information from the internet and present it to you right now. At some point? Sure maybe. But it's definitely not now.

What I want, and what I was pretty clear about in my list of use cases is the ability to push a button (or run a function with parameters) with my voice.

My exact use cases

- Kitchen timer: StartTimerFor(duration, timerName?), HowMuchTimeOnTimer(timerName)

- Alarms: SetAlarmFor(time, alarmName?), WhatAlarmsAreSet(), CancelAlarm(alarmName | time)

- Lights/devices(ex: tvs, ACs, etc): TurnOff(DeviceName | RoomName | ALL), TurnOn(DeviceName | RoomName | ALL)

- Conversions: WhatIs(numberOfUnits, inSecondUnit)

- Current time: WhatTimeIsIt(inlocation?)

- Current weather (optional, I don't use this a ton and it requires an upstream source, although HomeAssistant already gets that info for me): WhatWeatherIsIt(atTime?)

---

I've found basically all of the voice assistants I interact with are actually really damn good about understanding roughly those patterns (Alexa was the best, but only for the first year or so [honestly - it was actually wonderful as a beta product] - it's gone markedly downhill over the last several years as they try to cram in more detection and more features).

They just insist on trying to sell me on other parts of the experience (check out this tv show, there's a sale on, use this product, did you know this? did you know that? etc). And I don't want it.

But I'm more than happy to pay a fair sum of money for a thing that will just reliably do those commands.

It's incredibly liberating to be able to do those things with my hands busy, or my eyes closed, or while lying down.

Tack on an HDMI port or a display so I can view a recipe and I will literally give you money right now for this thing (and I have - since I've tried most commercial voice assistants).

Long term - I'll probably end up just cobbling together my own version using Rhasspy/Mycroft or another text parser/STT engine, and HomeAssistant or OpenHab if I can't get what I want commercially. I just know that I'll end up spending more money and time on it that way (which is not the end of the world, I just have other hobbies at the moment, and a very young child).

> Most people only use the voice assistants for a few simple tasks, which is perfect for an open source project like mycroft.

I'll certainly grant that... but the price point where Mycroft is, is certainly not near what I'd pay for doing those few simple tasks.

Apple is at the upper end of what I'd spend for such a device (the HomePod mini is $99) - and that's because I'm fairly invested into the Apple ecosystem and thus it can make use of the iTunes library, home automation, calendar items, etc...

If I wasn't invested in Apple, then none of the home assistants other than Amazon (because of the price point for the echo) would be particularly interesting.

I've got a echo show - because its a very nice simple clock/weather interface (that's got Alexa behind it) too (I really liked the Ambient 7 day weather clock when it was available). I've got an echo wall clock that is paired with the echo in the kitchen - it makes timers nicely visible (a sibling of mine has an echo wall clock because its an analog dial that doesn't have any sound with it).

The problems with Alexa of suggesting by the way ("Alexa, stop by the way" - give it a try and yes, it is routineable) are tolerable for how much I'm paying for them and the functionality that I use it for.

The Ars article on Alexa's financial crash-and-burn inside Amazon missed a lot of the reason people aren't willing to engage with Alexa as much as they could or would, if things were different. First, the privacy aspects are significant. Secondly, the value proposition is just not there - worse, Amazon has deliberately broken one of the most useful things you could do with Echo products: using them for distributed networked audio, a la Sonos: The new generation Echo Show products ELIMINATED the audio output jack, so you can't even plug the output into a stereo or speaker now!

On top of that, the Echo products are just not well built, not well thought out, and have NOT been upgraded to make them better: They update, but with NO visible benefit to the owner. One example: The Echo Show 8 Cannot and will not keep its display off all night, even if you explicitly command "display off" before going to bed (yes, it does understand and temporarily obey this command!) But sometime during the night, something will wake it up, and the damn thing turns into a lighthouse in your bedroom, waking one of us up.

I'd really like to find Alexa more useful, but like most folks I know with one, it's mostly just useful as a glorified voice-controlled radio - I'd use it more to control lights and such, if I could get the damn thing to actually realize waht lights are in what rooms, and that dimmer switches and smart lights can indeed share a location that should be controlled together. (Yes, this is supposed to work, but it doesn't...)

I would pay $500 to outfit the house with a central voice recognition processor that would be capable of supporting a dozen or so very secure listeners on the local LAN. Mycroft isn't that solution.

> The problems with Alexa of suggesting by the way ("Alexa, stop by the way" - give it a try and yes, it is routineable) are tolerable for how much I'm paying for them and the functionality that I use it for.

So cold comfort since it’s annoying as hell, but it slowly learns you don’t like it and will back off its frequency. Amazon unsurprisingly tracks “dissatisfaction” responses and adapts rate of things (globally and individually) so you do actually have to cuss out Alexa to change it. It’s slow because obviously it’s profitable but it does happen.

> That's why they're all going so aggressive on "you asked for the time, but by the way here's a 5 minute speech on all the easily monetizable tasks I can do instead"

This is a word-for-word description of how Siri originally functioned. "You asked for the top 5 romantic resturaunts nearby; here are the top results from Google Search:"

GP isn't talking about bad fallback answers where it punts you to a search page more like when you ask "Hey Alexa, what is the time" and it says "The time is 5:45 PM. By the way did you know you can buy ribbons for the holidays on Amazon by saying..." i.e. things that are blatantly unrelated to answering your question and often trying to sell you something.
It’s annoying because I kind of understand advertising the capabilities- but for Siri for example I cannot find a documented list of all the commands it can “understand” and so I can’t learn how best to use it. I just have to guess and hope I get close enough.
I don't think the entire space is dying. Amazon is having problems because Alexa has little benefit to them outside of direct monetization. They wanted people to use Alexa to buy things and no one wants to shop like that. So they have these devices and all this infrastructure to run people's kitchen timers, lights and play music. People will buy Alexa devices on Prime Day, use them dozens of times a day for years, and never make a dime for Amazon.

Apple isn't necessarily in the same boat. Siri isn't particularly good, but it does all those things well. Most importantly, it keeps people on iPhones and in the Apple ecosystem, which does make money.

You already have the phone and probably have a device that works with HomeKit so why not try it out. Next, you buy some new lights. Before you know it, you're controlling most of the lights in your house, streaming Apple Music and setting kitchen timers from your Apple Watch. Next time you need a new phone, you're not even going to think about anything else because if you change you won't be able to turn on your lights anymore.

Apple has a plan that works and Amazon doesn't.

This makes no sense to me. Apple's plan works because... their lock-in is better? I can "control most of the lights in my house, stream Apple music and set kitchen timers" with Alexa, Google Assistant, Cortana and even Bixby. What is Apple's actual advantage here? How is Apple making money from this when Amazon does not?
Apple makes money from the devices it sells. Siri makes those devices more convenient to use.

Amazon is selling the devices at or below cost and hoping that Alexa will make the money (which it doesn’t)

That only tells me which one will exist longer, not that "Apple's plan works". I've genuinely seen zero people deliberately use Siri (on iPhone or Mac) over the past 5 years. Apple is certainly losing money on Siri too.
If you look at Siri in a vacuum, you're almost certainly correct that it isn't a moneymaker. But Siri isn't in a vacuum. It comes with a device with an average cost of like $1000. It may not necessarily be widely used, but the ones who do use it are highly likely to remain in the Apple ecosystem and use other products and services that are highly profitable.

Look at all those Korean novelas and shovelware on Netflix. There are cohorts of subscribers who remain highly loyal because they're into it. Netflix isn't necessarily swinging for the fences with high brow, popular content that competes with the best studios in the world. Instead, they pump out a wide variety of content that keeps the maximum number of people subscribed.

Apple is similar. They promote features - whether it's Siri, health, privacy, family sharing/controls, etc. - that will strongly appeal to some cohort and keep them on the platform. Then, they incrementally hook you into services until you're buying $1000 devices for the whole family and paying $30/mo for the services bundle. And once you're there, they have you because the switching cost involves turning your digital life upside down.

I was one of those zero people until i realized i can tell Siri "add an appointment with Blahblah next tuesday at 11", it will actually understand that and it takes less time than using the calendar interface.

I'm sure enough people find some small use for the voice commands that it's a good feature to have on the phones.

Siri really shines with the HomePod or the watch I’ve found.
Apples devices are smarter. This makes them cost more for the "same" hardware, but costs less for the computation.

Apple isn't trying to make money with Siri. It's using Siri to make its ecosystem of Apple Music and similar more valuable to its customers.

The limits that Apple puts on what it can do makes that cloud side computation less expensive.

---

Consider that bit - less expensive. Apple doesn't run its own cloud in the way that Google, Amazon, or Microsoft do. So what does Alexa cost? It costs for AWS cloud time. That's the expense that it's running. Those skills that people use run on AWS compute time rather than a phone's local cpu and battery.

A google search "costs" about 1 KJ of energy. Alexa has similar costs somewhere just for energy and other costs for the maintenance of the additional software and content. It costs something to maintain that joke database.

I don't think Apple is smarter. Siri is really basic.

I can't even say "Turn on the lights and the fan". I have to present it all in bite-sized chunks. Come on how hard is that.

I have a feeling there is no AI in these supposedly intelligent assistants. Only scripted stuff.

There is more processing done in the Siri local device than there is in the Alexa local device.

It's not "Siri is smarter" but rather "Apple is working on minimizing cloud costs because it is entirely a cost for them."

Siri is scripted and limited to make it locally "smarter".

With Amazon and Alexa, everything goes to AWS because that's where the entirety of Alexa's processing happens. The hardware devices in the homes are "dumb" terminals for AWS with a voice interface. This allows Amazon to make use of AWS as much as it can and do something with "surplus" computing power that it has available on AWS.

Amazon has been working on on-device voice for a while. Actually everyone is trying to do that. Running large speech models in the cloud is expensive, considering the number of devices, they probably need more than "surplus" :)

https://www.amazon.science/blog/on-device-speech-processing-...

There is no AI beyond run of the mill NLU/ML stuff. It’s just trying to guess intent based on your query.

“What’s it like outside right now?” => GetWeatherIntent arguments=[time=current location=device_location]

There’s no Omni intelligent being making API calls anywhere. It’s NLU with a decision tree(s) based on the intent and arguments

I don't know if it's device specific but I use "Hey Siri, turn on the overhead lights and the floor lamp" for example all the time. I have a HomePod though.
Weird, I have a homepod (mini) also. In fact I hardly have any Apple stuff anymore, I only got homepods because it was the most privacy-friendly option out of the big three. I just have an iPad from work which I used to set them up. And most of my automation goes through Home Assistant anyway.

When I give a double command Siri literally tells me she can't do two things at the same time and I have to present them as two separate commands.

Perhaps it's because I have it set to UK English? Perhaps it's smarter in US English. I'll have to try that.

I think the point is that apple produces software that doesn't directly make them money as part of their business model. Pages, Numbers, iMovie, Maps are all given away for “free” with devices. Siri is just another example of that. As you say, being able to talk to your phone is table stakes, not an advantage. But having that offering keeps sales of Apple hardware moving.
Seems odd right? Just charge more than break-even for Alexa and you have a business?
Not necessarily because the cost increases through increased use by the customer. The idea was to sell hardware at close to cost and then monetize the customer. The likely assumption was that Alexa users would buy more similar to how Prime customers buy more. Ideally, Alexa customers were supposed to use services like "subscribe and save" and then randomly tell Alexa to order more toilet paper or laundry detergent. Amazon wanted all the household stuff on recurring subscriptions. It would have been great. Increased revenue, better ability to bundle shipments together to cut costs, customers who don't even look at prices anymore. Instead, they sell a device for which they incur an operating cost while producing little to zero increased revenue and subscriptions.
Right, I read the Ars Technica article, I get the original idea. It turns out that monetization strategy doesn't work and the Alexa business unit is losing billions of dollars.

I'm saying, new business idea: give up on the old strategy of Alexa driving induced profit in other business units, instead charge more than the breakeven cost of building devices and running the service. This will likely be substantially more, and fewer Alexas will be sold. But the users that do actually get a lot of value from the device and service they are using will pay more for it.

The product/market fit to test is: How many customers would pay more for a device that's not trying to sell you things? Can you get a solid (but smaller) business just by charging more for the device? What features would you add to persuade the marginal user to pay more? Offline mode for privacy-conscious users? Lean in to home automation features? What about a true AI-powered personal assistant? What about per-user language training with a local model that gets refined by your voice samples, with that data not shared back to the cloud? Etc.

Simple startup-style product iteration stuff here. You had a customer hypothesis and a growth model, and it was proven unviable. So can you pivot to find a viable business?

Charge more than break even for Alexa and they won’t sell enough.
> Apple isn't necessarily in the same boat.

Also importantly, Siri mostly runs off your iPhones processor, and apple doesn’t have to pay a big cloud bill for it, unlike Alexa.

Apple and the garden of eden. Dont take a bite unless you're going all in
The "voice assistant space" also includes Siri, Google Assistant, and Cortana, so it's not going anywhere.

I'd contend that it's absolutely not a solution in search of a problem; it's much more of an unsolved problem, and a big part of the "why" is

- voice recognition/assistance tech still maturing

- major players are insisting that the tech supports their walled gardens

- price points are still a problem

The last two creates a conundrum: a lot of times tech prices come down by selling expensive stuff to rich people until the hardware becomes commoditized. But for a good voice assistant, you need a lot of up-front investment at scale. Unfortunately, the companies that are able to do this are also controlling the hardware that can use it, which limits its ability to spread and be useful.

This is why I think Mycroft is important to support:

1. If you can make voice assistant software open-source and plug-and-play, then it frees people up to tinker with form factors

2. Part of Mycroft's pitch to businesses is that they can make custom solutions. There are probably a thousand big businesses that might want to get into this space but don't want to rely on Amazon because they want to control the experience and not give up their data. Maybe Target wants to stick virtual assistants around their stores, or maybe a hospital wants to give tools for surgeons.

I also think there's an opportunity for voice control in home stereo, where someone decouples the speakers from everything else. It's still annoying to work with Bluetooth in 2022, and Sonos is still pricey, and another walled garden. I'd love to have a simple controller that connects a dumb speaker to Wi-Fi and lets me voice-control it to play music from a library of my choosing. That's not a thing yet, right?

> custom solutions

Especially interesting as voice recognition is much easier, cheaper and more efficient within the limited space of a specific usage.

> 1. If you can make voice assistant software open-source and plug-and-play, then it frees people up to tinker with form factors

For what it's worth, Google Assistant does have an open API to create new devices. It's not open source, but you can certainly experiment with your own custom form factors. There's even a tutorial:

https://medium.com/google-cloud/how-to-build-your-own-smart-...

>- price points are still a problem

Really? An Echo Dot is $25 on Amazon right now. Which, if you use it at all, is pretty reasonable. (To be sure, if I were using it for music to any degree, I'd probably get a model with better speakers.)

For music, I have an old phone connected to a stereo receiver. So it has voice control although I mostly pick a playlist or album manually.

>An Echo Dot is $25

Yup, price point is solved for the "just timers and some music" people, but now you're stuck in Amazon's orbit.

That's a pretty weak orbit though if that's all I'm using it for. (I do use mine for music sometimes but it's actually connected to Apple Music.) I could switch to a different assistant tomorrow if I wanted to. I've literally never ordered anything by voice--and can't really see doing so.
I mean, these things are credibly accused of violating the Federal Wiretap Act, it's not just about the walled garden:

https://epic.org/wp-content/uploads/privacy/internet/ftc/EPI...

EDIT: other point I forgot - original article says they're bleeding money, so clearly $25 isn't sustainable right now anyways.

It's not dying at all - it's an incredibly useful interaction style.

Those companies are failing to profit because they don't understand that a digital assistant needs to be working with me, locally, and not subverting my intent.

It just needs to be my device, and not a sales rep for google/amazon. I use voice controls all the time at home - it's astoundingly useful in all sorts of situations, and I'm not even disabled (where it's literally life changing in some cases).

A truly open platform stands a chance in the voice assistant space, as it could be adapted into forms that are useful beyond their current limited designs. Such useful forms probably are not as monetizeable as the current incarnations that invasively collect information about you and your family, so I very much doubt the big tech players will ever attempt to build these useful systems directly.

Unfortunately, Mycroft is not very open itself. Sure, most of the code is open and available, but I tried to contribute and found my PRs ignored for weeks. When they were finally ready to merge them, their poor response cause me to lose interest in the project. At that time, they did not seem interested in cultivating a strong developer community around their core technology components; they were doing their thing, and they wanted the community to implement “skills”. I got the impression that community could either get on board or stand aside and watch them work. For that reason alone, I feel fairly certain that this project will fail eventually as well, and their hardware will become yet another high-tech relic of a paperweight.

As a formerly enthusiastic kickstarter backer, I cannot recommend the Mycroft project as the basis for a product; you don’t own and can’t control the platform on any meaningful way (short of forking it). It might be a better choice than a closed platform, but not enough to make me want to put any money in it.

Why do you need to control it yourself in order for it to be valuable as an open-source project? Maybe the team has a specific vision for the product, and reading through PRs from random people online takes away from their limited resources.
...but you can fork it. Is the build process really too painful for that to be enough?
We use Google home assistant devices throughout the house, and find them quite useful. Use cases:

- controlling smart devices (thermostats, TVs, speakers)

- broadcasting messages

- reminders / tasks

- asking questions

However, none of these use cases generate any revenue for Google afaik.

Ours is a voice-driven music player for our kids. They love it. We have a YouTube Premium subscription just because of the Nest Minis we have in every room. Sometimes we ask it "What's the animal of the day?" or "Tell a story" or "What year was Abraham Lincoln born?" but mostly the Nest Hub Maxes we have are just photo slideshows, which we love, and sometimes we ask "What's the weather?"

That set of functionality alone makes them well worth the money for us.

With all due respect, I find that thesis absurd.

Maybe you meant it like, "the voice assistant space isn't going to generate huge profits, and thus giant corporations will lose interest".

But even that is absurd. They will still have to do it as a loss leader. Maybe not Amazon — because they just ship us our toilet paper and protein bars and shit. They don't have an "ecosystem" (although they gave it a halfhearted try a few times).

But the chance that in 2032 people just like... don't have voice assistants? It's literally zero, barring an actual WWIII cataclysm reversion-to-barbarism event.

> doesn't work as well as people were hoping

Nothing does, until it does...

> little monetization

Yep, that might be right. But it doesn't necessarily mean the space is "dying". Just that it might not be amenable to oligopolization.

I will never have a voice assistant unless a completely open and self-hosted solution appears on the market. And with current patent landscape, that seems incredibly unlikely to happen before 2032.
OK, I can't keep reading this website any more tonight, but for fuck's sake you do realize that the submission you are commenting on is a completely open and self-hosted solution that is on the market, right?
Maybe one of the reasons behind this is that people use voice assistants, and search engines too in general, to look for information. Today, all that these products do is suggest instead of catering results. I believe this is one of the reasons people, or at least I, do not wish to use assistants. It feels that a computer is controlling my likes, dislikes and wishes while it should actually be me who controls computers.
Its certainly not a solution looking for a problem. Its a great way to deal with a number of minor daily tasks. Checking the time, setting timers, checking the weather/AQI, playing music, checking news headlines, etc.

Theres a lot of things its really not good for and people have tried them all I'm sure. But where it fails the hardest is being able to increase sales volume for Amazon or increase ad revenue for Google - the only path to monetization seems to be to force it in - and THAT is what is dying.

And who the heck wants it in a ring or glasses??

I always wanted a voice assistant but there's no way I'm having big tech listen in on me and my family 24/7 just to have one. Most non-tech family members I talked with about this share my opinion. THAT is why these assistants are failing.

On the other hand, Mycrodt sounds like something people would actually want to use provided that it can operate locally and doesn't send any data outside the home.

It works a lot better than nothing. I use Siri and Alexa every day. If Alexa goes away, I’ll use Siri more, or find another. Siri was a little slow to catch up.

I think the story that you read simply says that it’s hard to monetize. You are inferring more than what the story says.

Voice assistants are here to stay.

I eagerly await the day when I can simply say respond to this post then begin writing with my voice.

"Okay, navigating to the nearest post office"
I want a voice assistant that passes my AI turing test if you will. I want it open sourced like Mycroft too though. I don't care for having 17 speakers that start talking when they think I was talking to them. I wasn't.
My needs for a smart speaker are not really passing a turing test. I want them to be automations. They need to get some things that an AI would do right, but there is a large step between an assistant that can do specific things and an AI that can talk about anything.