| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by abhikul0 67 days ago
	I'll try to use that, but llama-server has mmap on by default and the model still takes up the size of the model in RAM, not sure what's going on.

1 comments

Try running CPU-only inference to troubleshoot that. GPU layers will likely just ignore mmap.