Hacker News new | ask | show | jobs
by ademeure 943 days ago
Am I the only one who doesn't like these specific voices? The quality is incredible, but they feel too cheery/enthusiastic/casual and it just gets annoying after a while.

I made an iOS shortcut a while ago that uses Siri with the ChatGPT app (it has iOS shortcut bindings) and despite Siri being a useless pile of junk compared to this, I actually prefer Siri's voice to this in some ways, because it doesn't feel so over the top.

Maybe this is partly because of different cultural expectations between the USA and Europe? Or maybe I'm just being too cynical and ChatGPT really is that happy talking with me!...

5 comments

Nope, you're not the only one. I posted as well: they sound to me like your classic, well-trained call center agents: Fake friendly (but please kill me now).

Reminds me way too much of some of the people I had to talk to, when cleaning up my mother's affairs. Places trying to get me to pay bills I did not owe, call center agents "cheerfully" following scripts that they themselves hated. The voices sound exactly like that.

Give me a neutral voice. This is a computer I'm talking to, not a fake friend.

It is very much a cultural thing, the voice equivalent of decorating your Instagram. Ordering pizza after a 60h work week? Well, better make it sound like fun!!
Of all of them, Sky's voice seems the most sober. I've been using it as a Plus subscriber for several weeks now and am also very impressed.

Yes, sometimes it thinks I'm done speaking when I'm not, but on the whole it's very good. Siri/Alexa, et al are not only unusable but are now supremely frustrating.

I don’t care about the voices themselves but the speech recognition is borderline unusable sometimes. It interjects when it shouldn’t and will frequently hear things incorrectly.

At one point it misinterpreted me mentioning “tai chi” as “I can’t breathe” and responded with advice about medical emergencies.

Do you mean Siri's voice recognition? If so, 100% agreed. My iOS shortcut uses OpenAI's Whisper API for voice recognition, and Siri (English United Kingdom - Siri Voice 1) for text to speech.

I really like dictating things sometimes, and Whisper is perfect for that (automatic paragraphs inside the model itself would be nice but not a big deal).

If anyone is interested - the "Whisper speech recognition in iOS" part is based on this shortcut I found that you can easily use yourself on both iOS and MacOS (free except for the OpenAI API usage fees obviously): https://giacomomelzi.com/transcribe-audio-messages-iphone-ai...

No, I mean the voice recognition in ChatGPT.

> free except for the OpenAI API usage fees

There are several versions of Whisper which have been distilled and can run locally, so I don’t see what advantage making API calls would be other than increased latency and decreased reliability and data security.

That's really interesting, Whisper is generally considered the current state of the art in STT and I've personally never experienced errors like the ones you describe. I've actually never had an error from Whisper.

First question, is there another STT you have used which works better for you?

Second question, is there any reason your voice might be considered unusual, like having a strong Welsh, Irish, or Indian accent, or being Deaf or Hard of Hearing?

Yeah, whisper is pretty good out of the box in my experience, but the vast majority of the time I’m using it in my car. So the conditions aren’t ideal, or are out of distribution for Whisper. However CarPlay is detectable and common enough from what I’ve heard.

Second, even if the transcription is correct, it cuts me off at inappropriate times. It’s hard to talk naturally without pauses.

I haven’t used a better transcription model, no.

Oh that's really interesting. Probably an acoustic environment it's not used to, like you said, but also people talk differently when they're driving. Like the cadence of our speech is significantly different because of the way our mental focus changes. I have to imagine that changes some things.
It is probably cultural or linguistic. I love audio books but I cringe when I find a book I want to listen to that has an English voice actor. I don't think it is just the accent but all the pacing and emphasis.

I also though don't like most the chatGPT voice models besides for Sky. Sky to me is really good. Robertson Dean reading an audio book is perfection but Sky is pretty awesome.

I should add that as an American there are a ton of American voice actors that ruin books for me too. Sometimes this can be fixed if played at 1.2X speed.