Hacker News new | ask | show | jobs
by mauricio 744 days ago
22B params * 2 bytes (FP16) = 44GB just for the weights. Doesn't include KV cache and other things.

When the model gets quantized to say 4bit ints, it'll be 22B params * 0.5 bytes = 11GB for example.