Y
Hacker News
new
|
ask
|
show
|
jobs
by
sergiotapia
389 days ago
the only reason they are fast is because the models they host are severely quantized so i've heard.
3 comments
jacob019
389 days ago
Huh. I heard a podcast with the founder talking about their custom hardware, but quantization would explain it.
link
christianqchung
389 days ago
Quantization alone does not explain it. It's mostly custom hardware[0].
[0]
https://groq.com/the-groq-lpu-explained/
link
zargon
389 days ago
Why repeat this nonsense when it’s so trivial to just check. The reason Groq is fast is because they employ absolutely ludicrous amounts of SRAM. (Which is 10 times faster than the fastest VRAM.)
link
behnamoh
389 days ago
they responded to my tweet last year and said they didn't quantize the models.
link
boroboro4
389 days ago
It's very hard to find right now but I'm sure they said they don't quantize KV cache, but their weights are in fp8.
link