| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by RantyDave 95 days ago
	Ahhh, so is this a chip "more optimised" for connecting GPU's to reality ... or are they skipping the GPU step entirely? Are GPU's only for training now?

1 comments

cyanydeez 95 days ago

have you seen this: https://chatjimmy.ai/

It's quite impressive what purpose build inference can/will do once everyone stops trying to become kind of the best model.

link

redwood 95 days ago

Wow impressive. What's the story with this?

link

jffry 95 days ago

It's a tech demonstrator for a company that turns models into custom silicon for fast inference. In this case llama3.1-8b https://taalas.com/products/

link

gizajob 94 days ago

Is this an ASIC? Or FPGA? Or something even more exotic?

I’m guessing it’s some form of ASIC because I can’t imagine crafting the logic of Llama on silicon is a very quick or easy job. Not that doing it on an ASIC is a piece of cake either.

link

jffry 94 days ago

An ASIC is custom silicon, no?

Anyways, I found this article discussing it a bit more: https://www.eetimes.com/taalas-specializes-to-extremes-for-e...

"Taalas is borrowing some ideas from the structured ASICs of the early 2000s to make its hardwired model-specific chips. Structured ASICs used gate arrays and hardened IP blocks, changing only the interconnect layers to adapt the chip to a specific workload. At the time, this was seen as a more cost-effective alternative to a full-custom ASIC that was more performant than an FPGA."

"Taalas changes only two masks to customize a chip for a specific model, but the two masks can change both model weights and dataflow through the chip. On the HC1, the model and its weights are stored on the chip using a mask-ROM-based recall fabric paired with a (programmable) SRAM, which can be used to hold fine-tuned weights and/or the KV cache. Future generations of chips may split the SRAM onto a separate chip, meaning they could be denser than the HC1."

link

hmartin 95 days ago

Taalas hardware implementation of Llama 3.1 8B They claim 16k tok/s vs Cerbras at 2k. https://taalas.com/products/

link