|
|
|
|
|
by nullc
868 days ago
|
|
P40 is essentially a faster 1080 with 24GB ram. For many tasks (including LLMs) it's easy to be memory bandwidth bottlenecked and if you are they are more evenly matched. (newer hardware has more bandwidth, sure but not in a cost proportional manner). I find that my hosts using 9x P40 do inference on 70b models MUCH MUCH faster than a e.g. a dual 7763 and cost a lot less. ... and can also support 200B parameter models! For the price of a single 4090, which doesn't have enough ram to run anything I'm interested in, I can have slower cards which have cumulatively 15 times the memory and cumulatively 3.5 times the memory bandwidth. |
|
Technically, P40 is rated at an impressive 347.1GB/sec memory bandwidth, and 4060, at a slightly lower 272GB/sec. For bandwidth-limited workloads, the P40 still wins.
The 4090 is about 3-4x that, but as you point out, is not cost-competitive.
What do you use to fit 9x P40 cards in one machine, supply them with 2-3kW of power, and keep them cooled? Best I've found are older rackmount servers, and the ones I was looking at stoped short of that.