The horizontal axis represents musical time, from the beginning to the end of the piece, while the vertical axis shows how far the similarities persist into the higher-level structure of the piece.
If I understand this correctly, this means that the vertical axis goes from "similarity in large-scale structure" at the top to "similarity at the momentary scale" at the bottom, so the triangle is a sensible way to look at it.
I think it might be easier to look at the triangle from bottom-to-top; the "higher-level structures" build upon/are made up of the lower-level structures.
Or at least that's how this musicology-phile interpreted them.
If I understand this correctly, this means that the vertical axis goes from "similarity in large-scale structure" at the top to "similarity at the momentary scale" at the bottom, so the triangle is a sensible way to look at it.