Hacker News new | ask | show | jobs
by cfn 1040 days ago
It is strange that it does that given that there's plenty of free memory available in the system (it has 256Gb of RAM and wasn't running anything else).
1 comments

Not really, it's just a question of accounting. mmap is functionally the same as disk cache. As long as you've got the RAM, it'll run from RAM. If you really want, you can force llama.cpp not to use mmap and explicitly load everything into RAM, but there's not really any performance reason to do that - if the kernel keeps dropping your pages, you're under memory pressure anyway and "locking" that memory will probably end up either thrashing or invoking the OOM killer.