Hacker News new | ask | show | jobs
by amstan 1112 days ago
Woah, that's a cool direction. Thank you! I'll explore this.
1 comments

P40s are kind of a meme. Using ggmls has roughly the same performance at a fraction of the wattage on a dual-channel DDR5 system.

I still use GPTQ for 30B, but even CPU generates quickly enough at q5_1 on modern hardware.