|
|
|
|
|
by haykmartiros
1289 days ago
|
|
To be honest, we're not sure how much value image pre training brings. We have not tried to train from scratch, but it would be interesting. One thing that's very important though is the language pre-training. The model is able to do some amazing stuff with terms that do not appear in our data set at all. It does this by associating with related words that do appear in the dataset. |
|
Why was stable diffusion able to generate spectrograms? Because it was fed some. Presumably, those original spectrograms were scraped with little concern over creators' permissions, just like it has been for artists' work in order to produce art-looking image generation. Please, research what has been happening in the art community lately. https://www.youtube.com/watch?v=Nn_w3MnCyDY
A protest on ArtStation has been shown to influence Midjourney's results, proving that huge amounts of proprietary work are constantly scraped without the creators' permission. AIs like these work so well just because they steal and remix real artists' work in the first place. There are going to be legal wars about this.
Stable Diffusion doesn't have an official music generation Ai precisely because it couldn't train it with the same approach without being sued by music labels right away, while isolated artists don't have the same power.
So, back to my question: have you wondered whose work is Stable Diffusion remixing here? Your endeavour is great technically, but as we progress into the future we have to be more aware of the ethical implications that come with different forms of progress.
You could try to base your project on a collection of free-to-use spectograms, and see how it performs. If you do, I think it could actually be very interesting and useful to discuss the results here on Hacker News.
Cheers!