|
|
|
|
|
by orost
1141 days ago
|
|
There is nothing out there that quite matches ChatGPT quality but you can get a similar kind of experience by running an instruction-tuned derivative of LLaMA with llama.cpp. Try something like vicuna-13b-free or oasst-sft-6-llama-30b on for size. The former is trained on ChatGPT output but with refusals (the censorship) removed, which seems to have mostly worked. The latter is safety-trained, but from what I've heard (I need a hardware upgrade before I can run 30B models), much more mildly than ChatGPT. It should run on whatever as long as you have enough memory. How much exactly depends on the quantization mode chosen (it's a quality-memory-speed tradeoff), but you should expect to need between 0.5 and 1GB of memory per 1B parameters in the model. |
|
Kobold cpp uses llama.cpp and provides a minimalist web UI. So if you want a chat assistant using llama.cpp on CPU, kobold cpp is probably what you want.