Hacker News new | ask | show | jobs
by claytonjy 87 days ago
> most teams we talk to can't even tell you how many GPUs are in use right now

how can this be? isn’t this a trivial metric to pull from any clouds monitoring service?

to get the good ones (H100+) you generally have to reserve them, a fixed cost you pay monthly and can’t pretend to not know

1 comments

Fair comment, especially as you mentioned in cases where capacity is fixed as part of a reservation. With fixed reservations, we've seen examples where basic monitoring doesn't always tell the full story such as where instances appear running, but sm activity is near 0. We've also heard from teams using on-demand capacity across clouds that they haven't yet stitched together their monitoring to see exactly who is using what, and where, in a single dashboard. That's something that we help provide insights into with our monitoring dashboards.