| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Gareth321 12 days ago
	System RAM has much lower bandwidth and less predictable access. Notably, the transfer from system to GPU is very slow. About 30x slower. LLMs aren’t designed to queue or parallelise operations to account for this. They just become much slower.