|
|
|
|
|
by kolinko
640 days ago
|
|
The chips are optimised for matmuls, but not for transformer architecture per se. With dedicated ASICS, and weights hardcoded (or stored in SRAM) we could theorically get 1 token per one cycle - so millions/billions of tokens per second, not hundreds. Etched, for example claims they have a chip reaching 500k tok/s in the works. Which is still far from the theoretical max with the current techology. A similar scenario went with Bitcoin's GPU/FPGA/ASIC - the current ASICs are millions of times faster than GPUs. |
|