Hacker News new | ask | show | jobs
by spott 699 days ago
I expect that the AMD also looses out when multigpu starts to be required for it (which is arguably going to be for much larger models than for the h100, but a 70B parameter model with bf16 training is going to hit multigpu in terms of memory requirements) as their interconnect is just way slower.
1 comments

Yes but as far as i understand it, the interconnect is not really important for model inference. But for model training more so.
Depends if you can fit the whole model into vram or not. If you can’t then you need some sort of gpu parallelism, and you need some sort of communication between the different gpus. But maybe that messaging is small enough that it doesn’t majorly slow down inference. I’m not sure.