I wonder if training another net on top of the slices would work better than voting for a single winner. I'd presume that there are genres that are well characterized by the distribution and progression of their spectrograms. Probably expand/compress the collection of slices to a standard length before training?
(Nice to see you show up for the discussion. I was worried that you'd given up hope before your article hit the front page.)
(Nice to see you show up for the discussion. I was worried that you'd given up hope before your article hit the front page.)