Hacker News new | ask | show | jobs
by hereonout2 481 days ago
This is an interesting take, and I'd guess that the training data for this probably did use podcasts as a source.

Getting very realistic / real world conversational training data for an ai would be hard. Only a subset of us appear on podcasts, radio or tv and probably all speak in a slightly artificial manner when we do.

2 comments

When I commented on the unnatural cadence, it told me that it had been trained on podcasts, which does help explain the issue - some people tend to “live-edit” themselves when a conversation is being recorded, which leads to this staccato. It seems they need to find a better source of training date for more natural conversational speech.
I agree, I thinks it's probably very easy to find billions of hours of conversation on YouTube, but non of it is set to training data with a good transcript.
Yep! it's public dialogue, intended for an audience with a prepared topic, etc. Or it's actors imitating private dialogue, but again shaping it towards an audience.

AI agents like this are trying to recreate personal intimacy I guess, which does feel like it might be different somehow.