Hacker News new | ask | show | jobs
by Rinzler89 640 days ago
It's already there. Have you seen the six figure AI chips that Nvidia is selling to the data center customers? Those chips are no GPUs, they can't draw a single triangle or map a single texture, they're AI accelerators all the way. People still think Nvidia is selling gaming GPUs for AI workloads like it's 2018?

Google, Meta, et-all are working on their own AI chips but those chips will have to beat Nvidia's at Performance and TCO and Nvidia shows no signs of slowing down to let competitors catch up.

2 comments

The chips are optimised for matmuls, but not for transformer architecture per se. With dedicated ASICS, and weights hardcoded (or stored in SRAM) we could theorically get 1 token per one cycle - so millions/billions of tokens per second, not hundreds.

Etched, for example claims they have a chip reaching 500k tok/s in the works. Which is still far from the theoretical max with the current techology.

A similar scenario went with Bitcoin's GPU/FPGA/ASIC - the current ASICs are millions of times faster than GPUs.

That’s fine if you never need to improve the model, which is valid in some use cases, but for chat style interaction or even code generation you’ll regularly have to update the weights.
Depends on a chip architecture - etched claims 0.5M tok/s with weights that can be updated. The main constraint is with the model architecture, where it needs to be specific transformer-based model. But they claim the chip can do both Mixtral and Llama - so the constraints are not too stiff.
> beat Nvidia's at Performance and TCO

TCO, yes. Raw performance, not necessarily. TCO will attack NVDA's margins. When Meta last wrote about their cluster it was presented as power equivalent to X NVDA chips. They are already bringing their own chips into the mix.