Hacker News new | ask | show | jobs
by bambax 1313 days ago
I'm not sure I understand what Vocaloid does? Does it generate vocal parts "from scratch" / just from lyrics? Or is it more like a vocoder?

The track you reference sounds like chipmunks sped up 2x; it's not unpleasant to listen to, and fun, but I feel it could be made just like that (record at 80bpm, high pass filter, maybe transpose 1 octave, and speed up to 180), no "AI" involved.

7 comments

It's an instrument: you have a piano roll interface, draw in your melody like editing MIDI in a DAW, and add lyrics to each note (usually with some manual phoneme fine-tuning), and it outputs a stream of vocal audio.

Human Japanese singers, especially women, tend to operate in a higher octave range than what is common in the west. It's slightly culturally insensitive to take shots at vocal pitch when talking about J-Pop. Pitch is largely a social/cultural construct, and Japan generally leans into the idea of higher pitch -> polite or cute and lower pitch -> aggressive or rude. (e.g. you raise your pitch when talking to your boss, and drop it to express your disgust with someone.) Just putting that out there, not trying to be accusatory or anything. It's just always good to keep in mind that western cultural norms are hardly universal.

Ah, thanks for the heads up!

For the record, I was responding to the gp saying

> ... pushing the boundaries of pop music in a way that wouldn't be possible with a real singer

=> I felt it was possible to do what the example does by singing slowly and speeding it up afterwards.

The chipmunk effect isn't even necessarily part of it. Most vocaloid music is in a more "normal" range.

It's a synthesizer. It's an alternative to human singers. I can imagine someone seeing a digital piano for the first time. "I'm not sure what it even does. I could just use an acoustic piano. It sounds the same."

Yeah fair enough. I have no problem with Vocaloid -- but I do have a problem with the over the top marketing copy (sorry if that was unclear).
As for the chipmunk sound, it's not unusual for female j-pop vocalists to operate one or two octaves higher than the unfamiliar western ear would generally consider pleasant.

There's also plenty of music directly derivative of the vocaloid scene that maintains a similar aesthetic with 'organic' vocalists and dispenses with some of the awkwardness of vocaloid-oriented compositions. Example: https://www.youtube.com/watch?v=hjJMIWyl_l4

A slightly more natural sounding track for reference: https://youtu.be/9vyIPWBeRes

This one if an official track for a popular vocaloid rythm game.

Also, at this point the “chipmunk” sound is part of the brand and will be kept to some extent for tracks labelled as vocaloids (it’s kind of a market on its own)

It generates audio from phonetic lyrics.
you write down phonemes on a DAW, and it synthesizes the voice for you. you also put down vibrato or other modifiers like you would for most other instruments.

the audio is generated from a voicebank that is a database of prepared phonemes recorded from a voice actor. some packages come with multiple variants of voicebanks, like you could have a "soft" voice and a "vivid" voice.

Look at the upload date on the video. It's 2 years old. No AI was involved, as that's being touted as a new feature of V6.