Hacker News new | ask | show | jobs
by lxe 561 days ago
I'm still running oobabooga because of its exlv2 support which does much more efficient inference on dual 3090s
1 comments

I haven't touched ooba in a while, what's the situation like with exl2 vs the non-homogeneous quantization methods people are using like q3k_s or whatever. IIRC while exl2 is faster the gptq quants were outperforming it in terms of accuracy esp at lower bit depths.