| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by waynecolvin 4387 days ago
	Question: Can anybody explain how synthetic singing voice is made? Especially things like pitch, a voice actor doesn't have to sing every syllable at every pitch do they?

4 comments

mutagen 4387 days ago

There are several approaches. I'm not sure what this software uses, machine translation of the site suggests samples.

Pitch shifting samples is one way to do it. A singer is recorded singing a syllable and that is shifted up or down by software. Artifacts creep in relatively quickly, especially with something as nuanced as the human voice. A variety of pitches and syllables can be sampled and the pitch shifting manually tuned to minimize audible artifacts.

Modeling could also be used, from simplistic models not far removed from the ADSR envelopes of basic synthesis to advanced physically based models. Samples and modeling could be combined to expand the palette of syllables.

Our ears and neural processing of speech and singing are finely tuned to process subtle shades of difference so any technique often sounds artificial. Fortunately this can be exploited musically and great music can be made with these 'artifical' sources.

link

bdonlan 4387 days ago

UTAU and Vocaloid do indeed use pitch-shifted samples (UTAU even lets you build your own sample libraries). A more recent product, CeVIO, uses modeling IIRC.

link

JonnieCache 4387 days ago

Such things do exist, with much more than single voice actors too: http://www.youtube.com/watch?v=oyijUC1g_yg

Voice "synthesizers" generally use specially developed algorithms. See: https://en.wikipedia.org/wiki/Speech_synthesis#Formant_synth...

link

chillingeffect 4387 days ago

It's a combination of several oscillators and complex resonator model.

The oscillator includes the vocal chords as well as a model of the lips for fricatives, plosives, etc.

The resonator includes several major tunable cavities, from the longs to the trachea to the sinuses, nasal cavities and mouth. These resonators form filters called formants which have default configurations for every vowel sound, however they are highly customized for every singer and express a terrific degree of nuance. Synthesis requires a multi-dimensional score somewhat like a speech synthesizer. The score can be dimension-reduced, but it will sound like crap. I would expect it to take about as much time to enter the data as it would to learn to perform it.

link

ohwp 4387 days ago

I always like this Vocaliod opera song: https://www.youtube.com/watch?v=VseHlKR4Ew8 (Voi che Sapete)

While the song is playing you can see the settings. I've played with it once but it's hard to get it right.

First you place the notes to add pitch and length. Then you attach phonetic codes to the notes. So it's not like you are adding words to notes. It's all about how it should sound. Then you also can add things like amplitude settings.

link