| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wing-_-nuts 103 days ago
	I see it making claims about 10x efficiency, but how is tokens / second / watt? The only machines I've seen with the memory bandwidth to effectively do local inference are Mx arm chips on mac.