|
|
|
|
|
by sjg
957 days ago
|
|
We had the same issue for our lab when we were spec'ing up a similar install. The one thing I didn't fully realise is the need of having an Enterprise subscription to run the A100s, H100s and other cards. The drivers for these cards are behind a paywall and for academia it looks like it's around $150 per card, per year to run (if you want to run in a DC - https://www.nvidia.com/content/dam/en-zz/Solutions/design-vi... ). We bought 3, A100s for the servers and 3 L40's (not L40S) this was limited due to space in the severs. The NVLINKs can be added later (it's just a bridge between the cards) so if you can get them without NVLINK I would. (NVLINK works well with cards stacked vertically rather than horizontally). We also bought a desktop system with dual 4090s (£15k - $18.5k) to start on smaller models before scaling up on our servers with the real grunt work happens. This worked well as many problems can be solved with smaller models before going all out needing 4 H100s. Hope this helps with the planning. Feel free to reach out if you want to chat more about our setup. (My HackerNews Username is the same as github and you'll find my email address there) |
|