| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vessenes 790 days ago
	Well, Groq memory-per-rack looks very low. That is to say, the whole world now understands how exciting very high inference throughput is in a way that almost nobody did when Groq started, (I think I saw an early pitch deck, and don't recall fast inference as even a differentiator in that deck, although I could be wrong). However, the number of Groq chips/servers/whatever you call them that are needed to get up to running Llama 400B looks like a lot. Like many, many racks worth. Plus, Groq claims to have converted over to only being a cloud provider now, and will keep their hardware to themself. Given that I can't even sign up for a pay-as-you-go API key with Groq right now, I think there's a lot of room for competitors.