| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zackangelo 715 days ago

L40S has 48GB of RAM, curious how they're able to run Llama 3.1 70B on it. The weights alone would exceed this. Maybe they mean quantized/fp8?

I just had to implement GPU clustering in my inference stack to support Llama 3.1 70b, and even then I needed 2xA100 80GB SXMs.

I was initially running my inference servers on fly.io because they were so easy to get started with. But I eventually moved elsewhere because the prices were so high. I pointed out to someone there that e-mailed me that it was really expensive vs. others and they basically just waved me away.

For reference, you can get an A100 SXM 80GB spot instance on google cloud right now for $2.04/hr ($5.07 regular).

1 comments

tptacek 715 days ago

Our standard A100 SXM 80GB price is $3.50/hr, for what it's worth.

link

Palmik 715 days ago

For a reference, that's at least 40% more than what H100 sxm would cost if you are willing to reserve for a month (so not apples to apples).

H100 will also be much faster, especially if you are willing to use fp8. Maybe 3-4x

link