|
|
|
|
|
by zackangelo
668 days ago
|
|
L40S has 48GB of RAM, curious how they're able to run Llama 3.1 70B on it. The weights alone would exceed this. Maybe they mean quantized/fp8? I just had to implement GPU clustering in my inference stack to support Llama 3.1 70b, and even then I needed 2xA100 80GB SXMs. I was initially running my inference servers on fly.io because they were so easy to get started with. But I eventually moved elsewhere because the prices were so high. I pointed out to someone there that e-mailed me that it was really expensive vs. others and they basically just waved me away. For reference, you can get an A100 SXM 80GB spot instance on google cloud right now for $2.04/hr ($5.07 regular). |
|