Hacker News new | ask | show | jobs
by abdljasser2 556 days ago
Good question. In my experience combining generic descriptors is what works best. This is probably due to the text captions used during training mostly consist of generic instrument names, genre names and adjectives.