Hacker News new | ask | show | jobs
by bigyabai 64 days ago
llama-cpp's process is, but macOS itself will swap hard when 10-14gb of memory is paged for LLM inference. Dense models especially would thrash zram.