| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwthrowuknow 687 days ago
	That’s fine if you never need to improve the model, which is valid in some use cases, but for chat style interaction or even code generation you’ll regularly have to update the weights.

1 comments

kolinko 686 days ago

Depends on a chip architecture - etched claims 0.5M tok/s with weights that can be updated. The main constraint is with the model architecture, where it needs to be specific transformer-based model. But they claim the chip can do both Mixtral and Llama - so the constraints are not too stiff.

link