| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by oreoftw 389 days ago
	most likely he was referring the fact that you need plenty of GPU-fast memory to keep the model, and GPU cards have it.

1 comments

There is nothing magical about GPU memory though. It’s just faster. But people have been doing CPU inference since the first llama code came out.