Hacker News new | ask | show | jobs
by TinkersW 245 days ago
Your average none shared memory GPU communicates with the CPU over PCIe which is dogshit slow, like 100x slower than DRAM.

I can upload about an average of 3.7 MBs per millisecond to my GPU(PCIe gen 3, x8), but it can be spiky and sometimes take longer than you might expect.

By comparison a byte based AVX2 prefix scan can pretty much run at the speed of DRAM, so there is never any reason to transfer to the GPU.