Hacker News new | ask | show | jobs
by themacguffinman 672 days ago
AI voice agents are weird to me because voice is already a very inefficient and ambiguous medium, the only reason I would make a voice call is to talk to a human who is equipped to tackle the ambiguous edge cases that the engineers didn't already anticipate.

If you're going to develop AI voice agents to tackle pre-determined cases, why wouldn't you just develop a self-serve non-voice UI that's way more efficient? Why make your users navigate a nebulous conversation tree to fulfill a programmable task?

Personally when I realize I can only talk to a bot, I lose interest and end the call. If I wanted to do something routine, I wouldn't have called.

4 comments

Think 1-800-CONTACTS not Siri. Call centers are super expensive and the user experience is usually pretty bad. There's a huge incentive to move to voice agents, but one of the challenges is building a framework to adequately test it. That seems to be what this is focused on.
If I understand correctly, it’s the push back on the call center in general when using AI agents. Why go through the trouble at that point versus another manner of fixing my issue.

For example, when I need to activate a new SIM card, I need to call the company to get it activated. But if I’m talking to an AI agent at that point, why not have me go through another channel (website/app?) to activate it?

Some people (like me) are primarily verbal processors:

- I am dictating this message through macOS's voice to text right now

- I am a huge user of Google Assistant

- I prefer to call people versus texting them

- I tend to call restaurants instead of using something like Toast to order takeout (although this is partially because online services will add a surcharge onto the price sometimes, and sometimes I need to ask questions about dietary restrictions, etc.)

Generally, wherever possible, I will use a voice interface versus a text based one to get my point across. It's just faster and more convenient for me. I'm pretty neutral on the consumption side: I read and listen to audiobooks in roughly equal amounts.

All that to say that, just like there are people out there who prefer text UIs, there are also people who prefer voice interfaces.

I use Superwhisper (no affiliation, just a happy user), which runs a local Whisper model, to create most of my email drafts and post-meeting notes. I find Whisper more accurate than Mac’s built-in speech-to-text, plus I’m faster at speaking than typing.

Sometimes, I even ‘talk’ into Cursor’s chat window instead of typing. The only downside? It can get a bit annoying for others when you're talking to yourself all day.

I'm looking for something like this that runs on Linux. Best thing I've found is LiveCaptions, but its output is janky. I can't just use it to type in any old text field, and its output requires substantial editing after the fact.

I guess I understand that a lot of things are being developed for Apple silicon specifically. It's just frustrating that despite hours of searching, I'm not finding anything decent.

Talon Voice is good and runs on linux.

https://talonvoice.com/

This looks really powerful for controlling the system with different scripts, but what if all I want it to do is let me narrate something and print out the sentences as close to real-time as possible? It's really just good STT that I'm looking for out of it.
The Talon voice dev created his own STT model that's very performant. The transcription quality is... good, but not world-class. It's better than anything that came out before Whisper IMO, but the newest generator of models can do things like inferring punctuation and words outside of its vocabulary (although the downside of the new generation of VTT is that they can sometimes hallucinate words that are very different from what you said).

It's a bit overkill to use Talon for just voice dictation, but that is 90% of what I use it for, and it's pretty good at it.

Interesting! I'll give Superwhisper a try.
The example of "fast food drive-thru" really cleared this up for me.

Frankly I'm surprised there isn't already some sort of NFC info transfer system in fast food restaurants' apps that lets you and everyone in your car enter your order while you're waiting in line, then knows when your car is up and brings you the food. Have the voice part be a fallback tier, not the primary one.

My grocery store can know when I'm arriving and bring out my food, based on location services on my phone. So can Walmart or Home Depot. Granted, they make me wait a couple hours until they notify me that my order is "ready" before I come get it.

I suppose it's possible this does exist and I just haven't seen it because I don't drive through fast food restaurants, but I don't get why a place that primarily takes orders in real time and hands them out the window can't broaden the way to submit them to include on-site online orders as well as "talk to our agent over a glorified walkie talkie" orders.

Sort of related, but I just came off a RyanAir flight of all things, and they have something similar. Instead of talking a stewardess to get a sandwich, I order it on the app and they bring it to me.

It worked quite well, and surprising coming from RyanAir.

It's been a dream of mine for some years now. They're pretty dumb still today.
I'm the same way, and I don't have any data on this, but it's possible that we're in the minority. This probably isn't the case, but hopefully anyone implementing such a system has thought through whether it will actually provide any value.

For example, if you had an existing IVR system and you tracked menu options and found that a significant portion of calls were able to be answered by non-smart pre-recorded messages, upgrading to an AI voice agent could be a reasonable improvement.

Our customers, who build voice agents, are often asked by their customers to make their voice agents more human-like and flexible. Their clients — businesses like pest control and automotive repairs — value providing a personalized experience but want the convenience and reliability of a 24/7 booking and answering service.
Can I upgrade to a web form instead?