| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by macNchz 877 days ago
	> Specifically if there's a complexity tax in offloading that makes the CPU-alone faster Anecdotal, but I played with a bunch of models recently on a machine with a 16GB AMD GPU and 64GB of system memory/12 core CPU. I found offloading to significantly speed things up when dealing with large models, but there was seemingly an inflection point as I tested models that approached the limits of the system, where offloading did seem to significantly slow things down vs just running on the CPU.