Hacker News new | ask | show | jobs
by rahen 455 days ago
I'm pretty surprised by the claimed memory usage for 300B parameters (table 1). If we compare similar models:

- Llama 3.1 with 405B parameters: 2 TB of memory (FP32), 500 GB (FP8)

- DeepSeek R1 with 671B parameters: 1.3 TB (scaling linearly, around 600 GB for 300B parameters)

Ling claims no more than 96 GB of memory, most likely for inference. That's far more than a 20% reduction. Am I missing something?

2 comments

I think they only claim their "Ling-Lite" 17B model can fit on a single 96GB GPU, their 300B model needs 8 of them (768GB of HBM)
Some of these models still produce great results with something low like 2.7 bits per variable.