Y
Hacker News
new
|
ask
|
show
|
jobs
by
abhikul0
67 days ago
I'll try to use that, but llama-server has mmap on by default and the model still takes up the size of the model in RAM, not sure what's going on.
1 comments
zozbot234
67 days ago
Try running CPU-only inference to troubleshoot that. GPU layers will likely just ignore mmap.
link