| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Rinzler89 687 days ago
	It's already there. Have you seen the six figure AI chips that Nvidia is selling to the data center customers? Those chips are no GPUs, they can't draw a single triangle or map a single texture, they're AI accelerators all the way. People still think Nvidia is selling gaming GPUs for AI workloads like it's 2018? Google, Meta, et-all are working on their own AI chips but those chips will have to beat Nvidia's at Performance and TCO and Nvidia shows no signs of slowing down to let competitors catch up.

2 comments

kolinko 687 days ago

The chips are optimised for matmuls, but not for transformer architecture per se. With dedicated ASICS, and weights hardcoded (or stored in SRAM) we could theorically get 1 token per one cycle - so millions/billions of tokens per second, not hundreds.

Etched, for example claims they have a chip reaching 500k tok/s in the works. Which is still far from the theoretical max with the current techology.

A similar scenario went with Bitcoin's GPU/FPGA/ASIC - the current ASICs are millions of times faster than GPUs.

link

throwthrowuknow 687 days ago

That’s fine if you never need to improve the model, which is valid in some use cases, but for chat style interaction or even code generation you’ll regularly have to update the weights.

link

kolinko 686 days ago

Depends on a chip architecture - etched claims 0.5M tok/s with weights that can be updated. The main constraint is with the model architecture, where it needs to be specific transformer-based model. But they claim the chip can do both Mixtral and Llama - so the constraints are not too stiff.

link

matwood 687 days ago

> beat Nvidia's at Performance and TCO

TCO, yes. Raw performance, not necessarily. TCO will attack NVDA's margins. When Meta last wrote about their cluster it was presented as power equivalent to X NVDA chips. They are already bringing their own chips into the mix.

link