That's great if you App is compute bound. "May all your Processes be compute bound." Back in the real world most of the time your Process will be io bound. I think that's the real innovation of the M1 chip.
Exactly because of the "real world" argument, turns out that a lot of actual real world loads are CPU bounds because they are so wastefully implemented. IO of all kinds has extremely high bandwidth these days and OoO helps hide the latency.
Important to clarify this every time it comes up: there is no on-die memory on the M1. It is normal, everyday, DDR4 memory which is located near to the processor. It's actually quite high latency at ~100ns.