Hacker News new | ask | show | jobs
by washadjeffmad 1130 days ago
A (fine-tuned) model's inference quality is a function of parameters and inputs, so you'll need to be aware of what something was trained on to prompt it correctly (usually in the model card). You'll also see huge differences in inference between llamacpp, ooba, etc.

I haven't benchmarked on Apple Silicon, but if you have the RAM, I'd recommend 30B SuperCOT ggml Q5_1 or a GPT-4-x-Alpaca variant. Because of the disparity in quality, I haven't used many models under 30B and so can't recommend one.

See rentry.org/lmg_models for a practical list and description.

1 comments

Thanks for the reply and the recommendations! I will see if my machine can handle some of the quantized 30B models.

I'm slightly confused about your comment about llama.cpp vs oobabooga. Doesn't text-generation-webui use llama.cpp underneath?

Also, huge thanks for the point towards https://rentry.org/lmg_models. That's an invaluable resource.