| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by trissi1996 544 days ago

Not really, llama.cpp can download for quite some time, not as elegant as ollama but:

    llama-server --model-url "https://huggingface.co/bartowski/DeepSeek-R1-Distill-Qwen-32B-GGUF/resolve/main/DeepSeek-R1-Distill-Qwen-32B-IQ4_XS.gguf"

Will get you up and running in one single command.

1 comments

yencabulator 544 days ago

And now you need a server per model? Ollama loads models on-demand, and terminates them after idle, all accessible over the same HTTP API.

link