| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by MacsHeadroom 1140 days ago

I don't know how anyone hasn't mentioned this yet, the $180 Nvidia Tesla P40 24GB is about as capable as a 4090 for running LLMs (~70% of the token throughput for 8x cheaper). You can even run two or more in SLI to run 65B or larger models.

Just search eBay for Nvidia P40. Be sure to add an aftermarket cooling fan ($15 on eBay), as the P40 does not come with its own.

The P40 is a LOT faster than an ARM Mac, and a lot cheaper.

(Note: Do not go older than a P40. Pascal or newer is required to run 4bit quantizatized models. For example. the $100 M40 24GB is effectively only 6GB as it must run models in 16bit.)

2 comments

oersted 1140 days ago

Can you provide sources for this claim? If true, how? What is it that the 4090 has that the P40 doesn't to justify the price?

I understand that 4090 is aimed at gaming and has a lot of extra bells and whistles like the RTX cores. But it is also consumer electronics and much cheaper than the enterprise GPU lines for the same power.

According to this 4090 already has double the raw flop performance of the V100 and is competitive with the most powerful GPUs in the market from last year.

https://www.aime.info/blog/en/deep-learning-gpu-benchmarks-2...

And according to this the V100 is ~60% faster than the P40

https://ai-benchmark.com/ranking_deeplearning_detailed.html

Not that these sources look particularly reliable, but still, consistent with intuition.

link

KaoruAoiShiho 1140 days ago

The claim of 70% of 4090 is very strange, my 4090 runs a 30b at roughly 25 tokens/second compared to the 1token/second claimed by the p40 user here: https://news.ycombinator.com/item?id=35861360

link

MacsHeadroom 1140 days ago

>compared to the 1token/second claimed by the p40 user

That user is doing something wrong. They may not be cooling it and are getting thermal throttled. That would be my guess.

The P40 is capable of upwards of 10 tokens/second with 30b.

link

BaculumMeumEst 1140 days ago

i was looking into an nvidia k80 before (so thanks for including your comment about needing pascal or greater) but i had a couple of concerns about the power connectors and pcie lanes/speed.

i read that data center gpus need specialized power adapters, and i didn't find good resources to see if it would be able to hook up to a consumer grade power supply or what adapters i would need

i think my tomahawk b450's pcie 3.0 x16 would suffice, but i'm not 100% sure if there would be bandwidth issues when running an nvme ssd alongside it

driver-wise i think i would be fine, i'm not sure if datacenter drivers are typically included in what's provided by linux distros but i'm sure i could make it work if not

and yeah i would definitely grab a cooler and probably undervolt and/or run it at a slightly lower clock speed to be safe

link