| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cjbprime 806 days ago
	Wouldn't expect that to work at all.

1 comments

hedgehog 806 days ago

Ollama (which wraps llama.cpp) supports splitting a model across devices so you get some acceleration even on models too big to fit entirely in GPU memory.

link