|
|
|
|
|
by MacsHeadroom
1140 days ago
|
|
I don't know how anyone hasn't mentioned this yet, the $180 Nvidia Tesla P40 24GB is about as capable as a 4090 for running LLMs (~70% of the token throughput for 8x cheaper). You can even run two or more in SLI to run 65B or larger models. Just search eBay for Nvidia P40. Be sure to add an aftermarket cooling fan ($15 on eBay), as the P40 does not come with its own. The P40 is a LOT faster than an ARM Mac, and a lot cheaper. (Note: Do not go older than a P40. Pascal or newer is required to run 4bit quantizatized models. For example. the $100 M40 24GB is effectively only 6GB as it must run models in 16bit.) |
|
I understand that 4090 is aimed at gaming and has a lot of extra bells and whistles like the RTX cores. But it is also consumer electronics and much cheaper than the enterprise GPU lines for the same power.
According to this 4090 already has double the raw flop performance of the V100 and is competitive with the most powerful GPUs in the market from last year.
https://www.aime.info/blog/en/deep-learning-gpu-benchmarks-2...
And according to this the V100 is ~60% faster than the P40
https://ai-benchmark.com/ranking_deeplearning_detailed.html
Not that these sources look particularly reliable, but still, consistent with intuition.