Hacker News new | ask | show | jobs
by ynniv 753 days ago
See llamafile (https://github.com/Mozilla-Ocho/llamafile), a standalone packaging of llama.cpp that runs an LLM locally. It will use the GPU, but falls back on the CPU. CPU-only performance of small, quantized models is still pretty decent, and the page lists estimated memory requirements for currently popular models.
1 comments

+100 to this, I don't think many people reading this thread realize how easy they've made it to run a LLM locally. It's a great start if you want to kick multiple tires (be careful to clean up! the gigs add up).

> wget https://huggingface.co/jartine/TinyLlama-1.1B-Chat-v1.0-GGUF...

> chmod +x TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile

> ./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile -ngl 999

https://euri.ca/blog/2024-llm-self-hosting-is-easy-now/