| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by LoganDark 801 days ago
	> Common LLM runners can split model layers between VRAM and system RAM; a PC rig with a 4090 can do inference on models larger than 24G. Sure and ASi can do inference on models larger than the Unified Memory if you account for streaming the weights from the SSD on-demand. That doesn't mean it's going to be as fast as keeping the whole thing in RAM, although ASi SSDs are probably not particularly bad as far as SSDs go.