| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vunderba 78 days ago
	I'm using the default llama-server that is part of Gerganov's LLM inference system running on a headless machine with an nVidia 16GB GPU, but Ollama's a bit easier to ease into since they have a preset model library. https://github.com/ggml-org/llama.cpp