Hacker News new | ask | show | jobs
by throwthrowuknow 640 days ago
That’s fine if you never need to improve the model, which is valid in some use cases, but for chat style interaction or even code generation you’ll regularly have to update the weights.
1 comments

Depends on a chip architecture - etched claims 0.5M tok/s with weights that can be updated. The main constraint is with the model architecture, where it needs to be specific transformer-based model. But they claim the chip can do both Mixtral and Llama - so the constraints are not too stiff.