Transformer is working really well- I'm very excited. I'll probably be sharing results soon. Yes, the attention makes a huge difference & the pieces are both more creative and more coherent.
Have you considered using 'learning from human preferences' as the loss function in addition to the Transformers? That was another OpenAI project, and it seems tailor-made for music generation: what is more 'I know it when I hear it' than music quality?