Hacker News new | ask | show | jobs
by grenoire 2235 days ago
Can anybody explain why the researchers are attempting to generate the whole song as a single waveform, as opposed to wiring generated MIDI into some instruments and separately a singing algorithm (perhaps a bit easier than the whole bulk work)?
3 comments

We did work last year on MIDI alone - https://openai.com/blog/musenet/ and some early work now on conditioning the raw audio based on MIDI (early results at the bottom of the Jukebox blog). Agreed though there should be interesting results from modeling different blends of MIDI, stem, and raw audio data. Raw audio alone gives us the most flexibility in terms of the kinds of sounds we can create, but it's also the most challenging to get good long term structure. Still lots more work to be done!
Something like MOD/XM music comes to mind.
It's very hard to express all the nuances of real music and tonality in MIDI -- so generating raw audio side-steps all the limitations of a MIDI intermediary, and IMO, the results are absolutely phenomenal!

(BTW, there are lots of AI music generators that generate MIDI, so it's less interesting either way.)

Well it’s not midi but what you’re describing is similar to this approach:

https://magenta.tensorflow.org/ddsp