Hacker News new | ask | show | jobs
by andy_xor_andrew 748 days ago
When reading Hacker News you develop a signal/noise filter, where lots of headlines make bold claims but you filter them out as embellishment or exaggeration.

My bullshit detector went off when I first saw Groq posted on HN - a startup is making their own chips (doubt) that performs faster than anything Nvidia has for inference (doubt) and accelerates LLMs to hundreds/thousands of tokens per second?? Mega doubt.

But... then I tried their demo, and... yeah, it's that good. Such an amazing company of talented individuals.

2 comments

The issue is that their chips need a huge amount of server blades and there's a big doubt whether this model actually scales. That is, how will Groq handle much larger models with a context of hundreds of thousands or millions of tokens? Right now this would require them to deploy a cluster with thousands of chips, versus 10 chips for say an NVidia system.

The other issue they don't mention is power, space, efficiency etc. We want to run larger models with less power, fewer server blades, at lower cost. Not use more server blades, more chips, more power, etc.

Cerebrus faces similar challenges with their wafer scale chips.

If anything, Google's TPU advancements chart a viable course. I suspect both Groq and Cerebrus will overcome the challenges and offer competitive compute options, depending on the context

SambaNova is the only one if the chip startups that is viable. It surprises me that people don’t see this.
8 year old unicorn++ with a public demo sounds credible?