Y
Hacker News
new
|
ask
|
show
|
jobs
by
amstan
1112 days ago
Woah, that's a cool direction. Thank you! I'll explore this.
1 comments
washadjeffmad
1112 days ago
P40s are kind of a meme. Using ggmls has roughly the same performance at a fraction of the wattage on a dual-channel DDR5 system.
I still use GPTQ for 30B, but even CPU generates quickly enough at q5_1 on modern hardware.
link
I still use GPTQ for 30B, but even CPU generates quickly enough at q5_1 on modern hardware.