Hacker News new | ask | show | jobs
by boreas 2062 days ago
Hope this gets a response - I don't see a paper.

If there is a paired dataset, I guess this could be "easy". The stylegan "input" is essentially used to control parameters at various stages within the network, so you could adjust them one at a time, or on varying schedules or something, to get the sort of gradual effect.

I know that randomly instantiated neural networks can produce some pretty trippy image transformations as well, so maybe there is a way to bootstrap without paired data.

Lastly, I doubt this is on the right track but it would be cool if you could produce appropriately styled images with a "compression" approach. ie, trying to fit the audio information into some small visually meaningful latent space, and then using that to generate images.

edit: ok just watched the first example, back to square 1 for me. Its literally pulling stuff from paintings.

1 comments

We're using audio analysis, and applying that to control the output of a Generative Adversarial Network (GAN) trained on a particular set of images, which define the visual theme.

My co-founder has a YouTube channel where he will be going into technical detail how this all works. He will be posting a video in the coming weeks. https://www.youtube.com/channel/UCNIkB2IeJ-6AmZv7bQ1oBYg

Thanks! I didn't notice the first example where its more apparent there are paintings used as a source material -- are the other examples on the landing page produced from paintings or some other original source? It would be an interesting task to try and design images knowing they will be used as the GAN's inspiration.

Also out of curiosity, did you determine anything about the legality of training the GAN on copyrighted images and decidings its output is its own creative work, or using public domain images?