| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by valine 1142 days ago

The best of the best right now is probably Vicuna 13B. The 30B and 65B LLaMA models are better on benchmarks, but there isn’t a compelling instruct fine tuned version of those yet so they require a lot of prompt engineering.

If you want to run Vicuna without quantization you need 25GB of VRAM, which exceeds pretty much all consumer GPUs. Vicuna 4bit GPTQ is decent though I personally notice a quality difference when comparing it to 16bit.

CPU is also an option, you can run pretty much any model that will fit in your RAM, although your performance will obviously suffer. LlamaCPP has gotten very popular.

1 comments

sudhirc 1142 days ago

Thanks for the advise.

link