| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by shubhamintech 85 days ago
	4.4 tok/s with reliable structured output is a solid local benchmark altho the question is whether SSD streaming introduces per-token latency variance that messes up tool call parsing downstream. The gap between 400 GB/s unified memory bandwidth and 17.5 GB/s SSD reads means you're in the hot path pretty much every time an expert isn't cached.