Hacker News new | ask | show | jobs
by oreoftw 389 days ago
most likely he was referring the fact that you need plenty of GPU-fast memory to keep the model, and GPU cards have it.
1 comments

There is nothing magical about GPU memory though. It’s just faster. But people have been doing CPU inference since the first llama code came out.