Hacker News new | ask | show | jobs
by jonathanlei 855 days ago
Jonathan from TensorDock (https://tensordock.com/) here - we listed two of our A100 and H100 clusters on the site.

The IB equipped on our clusters (can't speak to others) is 8x 400 Gbps. Most customers training foundational models are able to fully utilize that fabric in parallel.

1 comments

Which HCAs are enabling that? You're using eight 4-link QSFPs here?, presuming this is NDR?

And out of curiosity, is aggregate bandwidth the normal marketing metric in this industry? In my neck of the woods this would be reported as an NDR400 system.