| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by smokel 648 days ago

Does anyone know why the sizes of these models are typically expressed in number of weights (i.e 1.5B and 9B in this case), without mentioning the weight size in bytes?

For practical reasons, I often like to know how much GPU RAM is required to run these models locally. The actual number of weights seems to only express some kind of relative power, which I doubt is relevant to most users.

Edit: reformulated to sound like a genuine question instead of a complaint.

3 comments

magnat 648 days ago

Because you can quantize a model e.g. from original 16 bits down to 5 bits per weight to fit your available memory constraints.

link

GaggiX 648 days ago

The weight size depends on the accuracy you are running the model at, you usually do not run a model at fp16 as it would be wasteful.

link

tarruda 648 days ago

Since most LLMs are released as FP16, just the number of parameters is enough to know the total required GPU RAM.

link