| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by p1necone 385 days ago
	All of the examples sound like people doing scripted radio ad reads rather than natural speech. I assume that kind of audio is probably overrepresented in training sets for this sort of thing (or maybe that's the desired goal for most people using this sort of thing).

1 comments

horhay 378 days ago

Training "high" points in voice inflection has been the priority, we've seen this in the 4o voice outputs and to some degree the Google NotebookLM podcast outputs. I would assume it's because they're trying to make it "act", but now it's a problem of swinging too hard on one end of the spectrum.

link