Hacker News new | ask | show | jobs
Show HN: The first consumer AI that can make any type of phone call (getvibrato.com)
7 points by gangster_dave 936 days ago
Good morning HN,

Today we’re launching a free public beta for Vibrato, a consumer AI assistant that makes phone calls for you. The phone calls can be simple, like making an appointment, to complex, like negotiating a lower utility bill. As far as I know, this is the first time consumers have had the ability to make general calls with AI.

We’re launching with two major features:

- a task library, where you can immediately have Vibrato perform a specific task that we’ve vetted, like making a doctor’s appointment or reducing your Comcast cable internet bill.

- a GPT for ChatGPT (Plus subscription required), where you can chat with ChatGPT and define a completely custom phone call, beyond what we offer in our task library. For example, you can use this to make a phone call to a law office for legal advice.

Vibrato is already wildly effective at managing your outgoing phone calls. I probably have the pain point more than most, but I’ve used it to:

- Negotiate lower utilities bills

- Refill prescriptions

- Get updates on insurance claims

- Call doctor’s offices

A low latency voice app like this with this level of complex reasoning has only been possible for a few weeks, since the launch of GPT4 Turbo. So this is truly a milestone moment.

Please check it out if you have a moment, and I’m happy to take on any feedback.

Thanks,

David

david@getvibrato.com

3 comments

Are you doing any extra work for the low latency? Or is it just those 3 API callouts (speech-to-text, text to response, response to voice) have gotten much faster on the third party side?
The latency is dependent on those three APIs, but the biggest bottleneck is the GPT4 API. Its latency varies throughout the day, from <200ms to >1s. There are several application-level optimizations in Vibrato, like managing streaming audio and streaming text, but these aren't as impactful as the API latencies.
I wonder if there's some way to stream the GPT-4 response into the text to speech api and then stream that voice to the user. I don't think OpenAI's TTS API allows this, but if there were some API that could do this (or self-hosted model), you could give the appearance of being faster.
I don't think you can do that quite yet, since the TTS APIs require a full phrase in order to output fluent sounding speech. If the input is short, then the delivery/emotion/pauses are random per word/token. I actually think that type of system will be possible once we have a multimodal model that understands and outputs speech, with the intelligence of GPT4.
This is an excellent idea for automating repetitive voice instructions at scale. Much needed for creative and busy people. All the best
This seems like a game-changer for automation. Great idea!