Hacker News new | ask | show | jobs
by p1necone 385 days ago
All of the examples sound like people doing scripted radio ad reads rather than natural speech. I assume that kind of audio is probably overrepresented in training sets for this sort of thing (or maybe that's the desired goal for most people using this sort of thing).
1 comments

Training "high" points in voice inflection has been the priority, we've seen this in the 4o voice outputs and to some degree the Google NotebookLM podcast outputs. I would assume it's because they're trying to make it "act", but now it's a problem of swinging too hard on one end of the spectrum.