|
|
|
|
|
by ikeashark
616 days ago
|
|
I believe it comes from the original Llama papers where they chose these sizes because it fits each of the standard ML compute GPUs nicely. Model Size + Overhead (context length, etc...) 7B: 13 GB - fits on T4 (16 GB). 13B: 26 GB - fits on V100 (32 GB). 30B: 65 GB - fits on A100 (80 GB). 65B: 131 GB - fits on 2x A100 (160 GB). That's it really. |
|