Hacker News new | ask | show | jobs
by MuffinFlavored 71 days ago
> Running DeepSeek V3 (685B) requires 8×H100 GPUs which is about $14k/month. Most developers only need 15-25 tok/s.

> deepseek-v3.2-685b, $40/mo/slot for ~20 tok/s, 465 slots total

> 465 users × 20 tok/s = 9,300 tok/s needed

> The node peaks at ~3,000 tok/s total. So at full capacity they can really only serve:

> 3,000 ÷ 20 = 150 concurrent users at 20 tok/s

> That's only 32% of the cohort being active simultaneously.

1 comments

People work 8 hours a day presumably, I guess they are banking on this idea
only works if the users are evenly distributed around the globe (which is likely more of less the case). if the user concentrates in on century, the token rate will be terrible.