|
|
|
|
|
by midgetjones
3493 days ago
|
|
I'm interested that you used some Christmas songs as training (which wasn't obvious from what I read of the paper). Were they pop songs, traditional, or a mix? Further to my comment up there[0] - and I don't wish to sound a grinch because this is a really cool project - but would I be right in thinking you spent more time on the image description than the music? I saw that you specify a scale for the melody, would it be either possible to use a mode to generate the accompaniment around, so that the melody can move diatonically and risk too many clashes, or to allow the melody to follow the chord sequence somehow? Again, sorry if I sound too critical. It's a really awesome thing you've done, and I'm just a guy that listens to the music instead of the lyrics. [0] https://news.ycombinator.com/item?id=13079355 |
|
For lyrics, we actually didn't train on Christmas songs. Training data was a large collection of romance novels. (See neural-storyteller by Jamie Kiros). The "Christmas trick" we did was applying a "style shifting" after image captioning and before lyrics generation, where the shifting vector was obtained from ~30 Christmas songs.
For the music generation. Although we are aware of some basic music performing rules, such as melody following chord etc, we actually didn't add this kind of rules.
For the blues scale here's the thing. I didn't really know much about music, so I spent several hours reading things like basicmusictheory.com. It happened to introduce blues so we just used it. But you're right on the relevance between blues and pop: only a very small percentage in our pop music collection is blues, after we ran the scale-checking code.