Hacker News new | ask | show | jobs
by GamerAlias 910 days ago
In case, it's not blinding obvious to people. Groq are a hardware company that have built chips that are designed around the training and serving of machine models particularly targeted at LLMs. So the quality of the response isn't really what we're looking for here. We're looking for speed i.e. tokens per second.

I actually have a final round interview with a subsidiary of Groq coming up and I'm very undecided as to whether to pursue it so this felt extraordinarily serendipitous to me. Food for thought shown here

3 comments

tbh anyone can build fast hw for a single model, I’d audit their plan for a SW stack before joining. That said their arch is pretty unique so if they’re able to get these speeds it is pretty compelling
Our hardware architecture was not designed with LLMs in mind, let alone a specific model. It's a general purpose numerical compute fabric. Our compiler allows us to quickly deploy new models of any architecture without the need that graphics processors have for handwritten kernels. We run language models, speech models, image generation models, scientific numerical programs including for drug discovery, ...
They are putting the whole LLM into SRAM across multiple computing chips, IIRC. That is a very expensive way to go about serving a model, but should give pretty great speed at low batch size.
> the quality of the response isn't really what we're looking for here. We're looking for speed i.e. tokens per second.

But if it was generating high-quality responses, would that not make it go slower?

That would involve using a different model. This is not about the model, it’s about the relative speed improvement from the hardware, with this model as a demo.