The article explains it. You offload a lot of the processing required by the CPU onto the SSD, and you minimize read writes for the SSD architecture, reducing emulation requirements.
Not necessarily, but often enough that the most expensive part in a mid range or high end gaming PC is an accelerator card meant to offload certain forms of computation that the CPU can't do as fast...
Or perhaps the h.264/h.265 codecs built into modern CPUs and GPUs?
And this isn't at all a new phenomenon, either. We've been using accelerators and coprocessors (as they were often originally known) for decades.
Latency matters. You can execute instructions much faster from firmware on the SSD and higher abstraction level instruction usually translates to many low level instructions.