|
|
|
|
|
by rahen
455 days ago
|
|
I'm pretty surprised by the claimed memory usage for 300B parameters (table 1).
If we compare similar models: - Llama 3.1 with 405B parameters: 2 TB of memory (FP32), 500 GB (FP8) - DeepSeek R1 with 671B parameters: 1.3 TB (scaling linearly, around 600 GB for 300B parameters) Ling claims no more than 96 GB of memory, most likely for inference. That's far more than a 20% reduction. Am I missing something? |
|