Hacker News new | ask | show | jobs
by ikeashark 616 days ago
I believe it comes from the original Llama papers where they chose these sizes because it fits each of the standard ML compute GPUs nicely.

Model Size + Overhead (context length, etc...)

7B: 13 GB - fits on T4 (16 GB).

13B: 26 GB - fits on V100 (32 GB).

30B: 65 GB - fits on A100 (80 GB).

65B: 131 GB - fits on 2x A100 (160 GB).

That's it really.