| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bigyabai 4 hours ago
	The tradeoff is that the M3 Ultra's GPU loses to laptop GPUs in compute benchmarks. All of that bandwidth is wasted idling for token prefill. For inference workloads, it makes a lot more sense to optimize for prefill/ttft before maxing out memory bandwidth.