Hacker News new | ask | show | jobs
by washadjeffmad 1113 days ago
P40s are kind of a meme. Using ggmls has roughly the same performance at a fraction of the wattage on a dual-channel DDR5 system.

I still use GPTQ for 30B, but even CPU generates quickly enough at q5_1 on modern hardware.