|
|
|
|
|
by kolinko
650 days ago
|
|
Depends on a chip architecture - etched claims 0.5M tok/s with weights that can be updated. The main constraint is with the model architecture, where it needs to be specific transformer-based model. But they claim the chip can do both Mixtral and Llama - so the constraints are not too stiff. |
|