| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by RobMurray 63 days ago
	why? it's mostly reads. the weights are static.

1 comments

llama-cpp's process is, but macOS itself will swap hard when 10-14gb of memory is paged for LLM inference. Dense models especially would thrash zram.