|
|
|
|
|
by ynniv
753 days ago
|
|
See llamafile (https://github.com/Mozilla-Ocho/llamafile), a standalone packaging of llama.cpp that runs an LLM locally. It will use the GPU, but falls back on the CPU. CPU-only performance of small, quantized models is still pretty decent, and the page lists estimated memory requirements for currently popular models. |
|
> wget https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF...
> chmod +x TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile
> ./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile -ngl 999
https://euri.ca/blog/2024-llm-self-hosting-is-easy-now/