| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by orost 1188 days ago
	There is nothing out there that quite matches ChatGPT quality but you can get a similar kind of experience by running an instruction-tuned derivative of LLaMA with llama.cpp. Try something like vicuna-13b-free or oasst-sft-6-llama-30b on for size. The former is trained on ChatGPT output but with refusals (the censorship) removed, which seems to have mostly worked. The latter is safety-trained, but from what I've heard (I need a hardware upgrade before I can run 30B models), much more mildly than ChatGPT. It should run on whatever as long as you have enough memory. How much exactly depends on the quantization mode chosen (it's a quality-memory-speed tradeoff), but you should expect to need between 0.5 and 1GB of memory per 1B parameters in the model.

1 comments

acapybara 1188 days ago

Llama.cpp is more of a raw library with a demo ./main.

Kobold cpp uses llama.cpp and provides a minimalist web UI. So if you want a chat assistant using llama.cpp on CPU, kobold cpp is probably what you want.