I thought the cool way to do this was reverse each sub-block in place, and then reverse the whole vector. If you draw a few diagrams you can see that accomplishes the rotation. Yes it moves everything twice, but it has near perfect cache locality, so at least on large blocks tends to be faster than cycle decomposition or other juggling tricks. It's discussed in one of Bentley's "Programming Pearls" books.
Sometimes I want to design a simpler C-like language and build toolchain for it from the scratch, with no historical baggage. Obviously optimization story will be very poor, gcc carries hundreds of super-qualified man-years of optimization work. But I wonder if it'll be that bad. Modern computers are fast.
cproc/qbe, on my heavy CPU benchmarks (compression), I get 70% of gcc speed (then probably clang). cproc is mostly one person, like qbe.
In other words, on modern CPUs, the fact that such compilers (cproc/qbe is only one alternative, probably near "real-life") are orders of magnitude smaller, are _not_ written in one of the worst computer languages ever (c++), mean that gcc (and clang) is a problem for open source. That's why the people need _lean_ open source now.
Moving gcc to c++ was probably one of the worst mistakes in open source, ever. Basically the only reason I can see for this disaster would be to force gcc devs to deal with this brain damaged computer language to force gcc to have a 'real-life' support of it. Because some critical, for some users, software is c++ written (and that was a mistake in the first place).
That said, the real end game here, is a "wolrdwide standard CPU ISA" with very performant implementations, assembly written software (without abuse of a macro-preprocessor), probably with a set of very high languages interpreters written themselves in assembly. Currently, RISC-V is taking shape, slowly because the "market" is already "saturated" and state of the art production lines are hogged by IP locked ISAs (and mistakes _will_ be made which is going to slow it down even further). In this kind of realm, even ISO will have a hard time generating cycles of computer language syntax planned obsolescence.