| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kolinko 650 days ago
	Depends on a chip architecture - etched claims 0.5M tok/s with weights that can be updated. The main constraint is with the model architecture, where it needs to be specific transformer-based model. But they claim the chip can do both Mixtral and Llama - so the constraints are not too stiff.