| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Art9681 106 days ago
	It's going to be faster no matter what. My M3 MAX prints tokens faster than I can read for the new MoE models. It's the prompt processing that kills it when the context grows beyond a threshold which is easy to do in the modern agentic loops.

1 comments

If your computer was faster at it, you could run more capable models at the same token rate.