If you want performance, you still better do it through DMA transfers that bypass the CPU, because otherwise, the CPU will still be waiting for thousands of cycles to fetch data from the device on the other side of the bus.
And the transfers that are done by the CPU should be write-only to the bus as much as possible.
The AGP bus was invented to remove the bottleneck for video cards the year this article was written, which wasn't phased out until PCIe became common in the mid 00s.
Data transfer from the host CPU to the GPU card can kill the performance of offloading. You need a hefty data-parallel kernel, with a high-ish work-per-element, to get speedup that's worth the data transfer costs.
GPUs worked well because you could transfer all your large art assets upfront and then only communicate your mesh and shader logic as the game ran. They don't work so well if you need frequent access to system memory.
PCIe 3: 16 lanes * 8 Gtransfers/s * 128/130 (encoding) : ~126 Gbit/s
So, yes, it has changed quite a bit!
But so has everything else.
If you want performance, you still better do it through DMA transfers that bypass the CPU, because otherwise, the CPU will still be waiting for thousands of cycles to fetch data from the device on the other side of the bus.
And the transfers that are done by the CPU should be write-only to the bus as much as possible.