Hacker News new | ask | show | jobs
by eurekin 876 days ago
Just make sure you're comfortable with manually compiling the bitsandbytes and generally combine a software stack of almost out of date libraries
2 comments

P40 still works with 12.2 at the moment. I used to use K80s (which I think I paid like $50 for!) which turned into a huge mess to deal with older libraries, especially since essentially all ML stuff is on a crazy upgrade cadence with everything constantly breaking even without having to deal with orphaned old software.

You can get gpu server chassis that have 10 pci-slots too! for around $2k on ebay. But note that there is a hardware limitation on the PCI-E cards such that each card can only directly communicate with 8 others at a time. Beware, they're LOUD even by the standards of sever hardware.

Oh also the nvidia tesla power connectors have cpu-connector like polarity instead of pci-e, so at least in my chassis I needed to adapt them.

Also keep in mind that if you aren't using a special gpu chassis, the tesla cards don't have fans, so you have to provide cooling.

That's a good point. Are you referring to the out of date cuda libraries?
I don't remember exactly (either cuda directly or the cudnn version used by the flashattention)... Anyway, /r/localLlama has few instances of such builds. Might be really worthwhile looking that up before buying