|
|
|
|
|
by antirez
919 days ago
|
|
Use llama.cpp for quantized model inference. It is simpler (no Docker nor Python required), faster (works well on CPUs), and supports many models. Also there are better models than the one suggested. Mistral for 7B parameters. Yi if you want to go larger and happen to have 32Gb of memory. Mixtral MoE is the best but requires too much memory right now for most users. |
|