Hacker News new | ask | show | jobs
by zozbot234 322 days ago
iGPUs (and NPUs) are not very useful for LLM inference, they only help somewhat in the prompt pre-processing phase. The CPU has worse bulk compute but far better access to system memory bandwidth, so it wins in token generation where that's the main factor.

My conspiracy theory is that it would help if contributors kept the Vulkan Compute proposed support up to date with new Ollama versions; no maintainer wants to deal with out-of-date pull req's.