|
|
|
|
|
by extheat
680 days ago
|
|
A simple equation to approximate it is `memory_in_gb = parameters_in_billions * (bits/8)` So at 32 bit full precision, 70 * (32 / 8) ~= 280GB fp16, 70 * (16 / 8) ~= 140GB 8 bit, 70 * (8 / 8) ~= 70GB 4 bit, 70 * (4 / 8) ~= 35GB However in things like llama.cpp quants sometimes it's mixed so some of the weights are Q5, some Q4, etc, so you usually want to take the higher number. |
|