|
|
|
|
|
by Dylan16807
807 days ago
|
|
It still has to go through the entire memory system. It's hard for me to imagine that transferring a number from the CPU to the GPU is faster than transferring a byte, and if you have 2 CPU-resident numbers per GPU-resident byte that's a lot of transferring. |
|
I think somewhere the blog did mention HQQ for 1 bit is slower for now, maybe due to the transfer overhead, although I couldn't exactly remember where