| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gwern 995 days ago
	No, that's easy. We had the equivalent of that in GANs many years ago. If you've never seen GAN editing, here's a quick video: https://www.youtube.com/watch?v=Z1-3JKDh0nI (Background: https://gwern.net/face#reversing-stylegan-to-control-modify-... ) You just classify the latents and then you can edit it. These days, with pretrained models like CLIP, you don't necessarily even need a latent space: you can take a model which has been trained on sound/text descriptions, like AudioCLIP, prompt it with a text like "vocal fry", and then the generated samples are subtly skewed to try to maximize similarity with "vocal fry". You put a slider on that for how much weight/skewing it does, and now you have a slider control to adjust properties of the voice from the AI. If something like this doesn't exist, it's obvious how to do it. (Even the realtime problem is being solved by figuring out how to train diffusion models to do a GAN-like single pass: https://arxiv.org/abs/2309.06380 )

1 comments

techdragon 995 days ago

I didn’t get to really explore the GAN generation of ML work particularly well since I had no supported hardware (no desire to support the nVidia monopoly on ML work) and refused to blow money on cloud instances I’d probably forget at some point and wind up with a giant bill.

It’s a really different world now I’ve got massive models running on my laptop thanks to Apple Silicon and the unified memory architecture, and the c++ ports of various diffusion image models and several families of large language text models work well on my AMD gpu too… it’s so much easier to participate in the current generation of applied ML work without having to go out of my way to have specific ML supported hardware.