| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by GamerAlias 910 days ago
	In case, it's not blinding obvious to people. Groq are a hardware company that have built chips that are designed around the training and serving of machine models particularly targeted at LLMs. So the quality of the response isn't really what we're looking for here. We're looking for speed i.e. tokens per second. I actually have a final round interview with a subsidiary of Groq coming up and I'm very undecided as to whether to pursue it so this felt extraordinarily serendipitous to me. Food for thought shown here

3 comments

mlazos 910 days ago

tbh anyone can build fast hw for a single model, I’d audit their plan for a SW stack before joining. That said their arch is pretty unique so if they’re able to get these speeds it is pretty compelling

link

tome 910 days ago

Our hardware architecture was not designed with LLMs in mind, let alone a specific model. It's a general purpose numerical compute fabric. Our compiler allows us to quickly deploy new models of any architecture without the need that graphics processors have for handwritten kernels. We run language models, speech models, image generation models, scientific numerical programs including for drug discovery, ...

link

pclmulqdq 910 days ago

They are putting the whole LLM into SRAM across multiple computing chips, IIRC. That is a very expensive way to go about serving a model, but should give pretty great speed at low batch size.

link

chihuahua 910 days ago

> the quality of the response isn't really what we're looking for here. We're looking for speed i.e. tokens per second.

But if it was generating high-quality responses, would that not make it go slower?

link

nomel 910 days ago

That would involve using a different model. This is not about the model, it’s about the relative speed improvement from the hardware, with this model as a demo.

link