|
|
|
|
|
by shahbazac
922 days ago
|
|
Can someone answer CS 101 questions about this please. I know there are other methods related to matrix factorization, but I’m asking specifically about quantization. Does quantization literally mean the weight matrix floats are being represented using fewer bits than the 64 bit standard? Second, if fewer bits are being used, are CPUs able to do math directly on fewer bits? Aren’t CPU registers still 64 bit? Are these floats converted back to 64 bit for math, or is there some clever packing technique where a 64 bit float actually represents many numbers (sort of a hackey simd instruction)? Or do modern CPUs have the hardware to do math on fewer bits? |
|
But even without such support there’s a benefit of model size compression so that bigger models can fit in GPU memory, eliminating costly CPU/GPU data transfers.