Weirdly enough, PGO equalizes and increases the runtime, where both methods take approximately 330 us. (I simply applied PGO to the full benchmark harness, no idea if this is proper). Them being equal sounds more reasonable, but the increase in runtime doesn't feel right.
Micro benchmarking remains a fickle beast :)