Hacker News new | ask | show | jobs
by losteric 1179 days ago
yeah, I believe some readers are misinterpreting the report. The OS manages mmap, it won't show up as "regular" memory utilization because it's lazy-loaded and automatically managed. If the OS can keep the whole file in memory, it will, and it will also magically swap to disk prioritizing explicit memory allocation (malloc).

Sounds like the big win is load time from the optimizations. Also, maybe llama.cpp now supports low-memory systems through mmap swapping? ... at the end of the day, 30B quantized is still 19GB...