Hacker News new | ask | show | jobs
by dylan522p 1226 days ago
- Google qps is closer to 100k then 320k [1]

That number is wrong. I have a number from googler, not livestats which cannot have google internal data

- Not every query has to run on LLM. Probably only 10% would benefit from it

Agree, i have something different coming up that looks into this more, 10% may be too low. I know i used 100% which isn't right, and explicitly say that

- This means 10,000 queries per second, each needing 5 A100s to run, so 50,000 A100s are sufficient. Cost for that is $500MM, quadruple that to $2B with CPU/RAM/storage/network. That is peanuts for Google.

50k A100s networking ramp cost way more than $2B HW utilization rate

- Latency, not cost, is a bigger issue. This should be addressed soon by H100 and newer chips.

Thats discussed in the subscriber section. It's both, but yes latency is bigger issue. H100 helps but doesn't solve.