Hacker News new | ask | show | jobs
by thekevan 481 days ago
It's good, but it still sounds fake to me, but in a different way. The voice itself sounds like a human, undoubtedly.

But the cadence and the rhythm of speaking are off. It sounds like someone who isn't a podcaster trying to speak in the personality of a podcaster. It just sounds like someone trying too hard and speaking in an unnatural way.

8 comments

I tried the demo and could tell it was fake in the first five seconds. IMO it sounds like it was trained on Northern California founders giving a pitch for their startup. Way too enthusiastic and trying too hard to sound natural.
For all they talk about diversity, you can pretty much pinpoint every tech product to SV because they are all using the same cultural cookie cutter.
I also think it didn't feel very "real". Trying too hard to sound upbeat and too eager to please, maybe it's just me being European but it makes me go "ewww, that's not how normal people speak".
That's american office culture for you. If it was australian it'd be drab, boring, and self-flagellating
No, I think it's a sign of it being a fake human. It sounds more like someone trying to speak like an influencer or podcaster and not being very good at it.
Yeah the eagerness to please thing feels like it carried over from the LLMs or something cause they're like that too.
It sounds like a "sales and marketing coordinator" for something very tech-bro adjacent after two strong cups of coffee.
Humans are extremely well tuned to detect authenticity in communication. Especially younger generations raised on mass marketing.

This is good in a way a scifi movie shows a tech, sounds cool and demos futuristic possibilities. But not quite passing the real human vibe yet. But I'm sure some people might find it preferable to a more to-the-point system like GPT or Siri/Alexa in certain niche cases not requiring immediate gratification.

>Humans are extremely well tuned to detect authenticity in communication.

I think the long-standing success of advertising and propaganda suggests that people really aren't all that good at that.

I suspect success of advertising is less about people falling for deception and more about information availability. If you know nothing about two brands except you've heard the name of one 50 times in ads, you'll probably try it first.

I think propaganda is a better example, although again I think often people aren't deceived, they simply agree with the message or don’t care about the underlying truthfulness of the message and just use it as a way to align with their tribe, etc.

This is an interesting take, and I'd guess that the training data for this probably did use podcasts as a source.

Getting very realistic / real world conversational training data for an ai would be hard. Only a subset of us appear on podcasts, radio or tv and probably all speak in a slightly artificial manner when we do.

When I commented on the unnatural cadence, it told me that it had been trained on podcasts, which does help explain the issue - some people tend to “live-edit” themselves when a conversation is being recorded, which leads to this staccato. It seems they need to find a better source of training date for more natural conversational speech.
I agree, I thinks it's probably very easy to find billions of hours of conversation on YouTube, but non of it is set to training data with a good transcript.
Yep! it's public dialogue, intended for an audience with a prepared topic, etc. Or it's actors imitating private dialogue, but again shaping it towards an audience.

AI agents like this are trying to recreate personal intimacy I guess, which does feel like it might be different somehow.

A few times the CEO of my company randomly joined me for lunch, but each time he forgot to leave behind his persona of "I'm a public speaker right now", making the whole situation feel extremely awkward. This AI gives me exactly the same vibes.
People have a performative mode and an authentic mode (oversimplifying), probably including you. If you're at home talking to your parents or spouse, and then suddenly realize your boss is in the next room listening, does your voice change?

Point being, this demo voice is in performative mode, and I think sounds fairly natural based on that. Would you rather it not?

It sounds like someone who is doing a microphone test for something they just bought and hearing themself on a delay from the monitoring.

Yes that is very specific, but that's what it sounds like to my ear.

To me the actual words it used also seemed fake, sort of too deliberately breezy.