Show HN: Vocal timing conditioned audio diffusion in real-time

Y	Hacker News new \| ask \| show \| jobs

	Show HN: Vocal timing conditioned audio diffusion in real-time (riffusion.com)
	8 points by haykmartiros 954 days ago
	We've been cooking up a new experiment where you can record yourself singing or talking and the app will generate vocals to match your words and timings. It's backed by an end-to-end latent diffusion model that generates audio conditioned on both the style and the lyric timings - and it's quite fast. Your actual voice and melody are not used, just the transcription, and we don't store the recording. We've found it's a really natural way to control the output you want and dream up a song concept. Curious to hear what you think!

3 comments

badFEengineer 954 days ago

I've been pretty bearish on gen AI for music, but this is the most fun I've had playing with an AI tool in a long time- the filters remind me of the OG Instagram filter effect, where even shitty photos from phones could "magically" be enhanced.

This + the Music ControlNet post from yesterday gives me some hope that audio AI will go the direction of creative tools, rather than dystopian full song generation.

link

ricepaddies3 954 days ago

I'm impressed with the quality of the sound! Some of my generations were for certain bops I'm finding myself regenerating on "Surprise" just to see what the model can toss up.

Would it be possible for the model to generate based on the recorded melody in the future? It might also be cool to have increased controls, e.g. choose between male and female vocals, and things like that.

Super nice work!

link

sarawiltberger 954 days ago

Very cool! Is this the state of the art music gen model out there?

link