|
|
|
|
|
by SushiHippie
757 days ago
|
|
They said: "Even running a 7B will take 14GB if it's fp16." Which is correct, fp16 takes two bytes per weight, so it will be 7 billion * 2 bytes which is exactly 14GB. They are probably aware that you could run it with 4 bit quantization (which would use 1/4 of the RAM) but explicitly mentioned fp16. |
|
> Which is correct, fp16 takes two bytes per weight, so it will be 7 billion * 2 bytes which is exactly 14GB.
As I said, it is "entirely irrelevant", which is the exact wording I used. Nowhere did I say that the calculation was wrong for fp16. Irrelevant numbers like that can be misleading to people unfamiliar with the subject matter.
No one is deploying LLMs to end users at fp16. It would be a huge waste and provide a bad experience. This discussion is about Copilot+, which is all about managed AI experiences that "just work" for the end user. Professional-grade stuff, and I believe Microsoft has good enough engineers to know better than to deploy fp16 LLMs to end users.