Ahhh, so is this a chip "more optimised" for connecting GPU's to reality ... or are they skipping the GPU step entirely? Are GPU's only for training now?
Is this an ASIC? Or FPGA? Or something even more exotic?
I’m guessing it’s some form of ASIC because I can’t imagine crafting the logic of Llama on silicon is a very quick or easy job. Not that doing it on an ASIC is a piece of cake either.
"Taalas is borrowing some ideas from the structured ASICs of the early 2000s to make its hardwired model-specific chips. Structured ASICs used gate arrays and hardened IP blocks, changing only the interconnect layers to adapt the chip to a specific workload. At the time, this was seen as a more cost-effective alternative to a full-custom ASIC that was more performant than an FPGA."
"Taalas changes only two masks to customize a chip for a specific model, but the two masks can change both model weights and dataflow through the chip. On the HC1, the model and its weights are stored on the chip using a mask-ROM-based recall fabric paired with a (programmable) SRAM, which can be used to hold fine-tuned weights and/or the KV cache. Future generations of chips may split the SRAM onto a separate chip, meaning they could be denser than the HC1."
It's quite impressive what purpose build inference can/will do once everyone stops trying to become kind of the best model.