Hacker News new | ask | show | jobs
by rexreed 1279 days ago
"fine-tuned on images of spectrograms paired with text"

How many paired training images / text and what was the source of your training data? Just curious to know how much fine tuning was needed to get the results and what the breadth / scope of the images were in terms of original sources to train on to get sufficient musical diversity.