Y
Hacker News
new
|
ask
|
show
|
jobs
by
mauricio
744 days ago
22B params * 2 bytes (FP16) = 44GB just for the weights. Doesn't include KV cache and other things.
When the model gets quantized to say 4bit ints, it'll be 22B params * 0.5 bytes = 11GB for example.