| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zettabomb 323 days ago
	llama.cpp has built-in support for doing this, and it works quite well. Lots of people running LLMs on limited local hardware use it.

1 comments

llama.cpp has support for running some of or all of the layers on the CPU. It does not swap them into the GPU as needed.