Hacker News new | ask | show | jobs
by wkat4242 705 days ago
Yes but as far as i understand it, the interconnect is not really important for model inference. But for model training more so.
1 comments

Depends if you can fit the whole model into vram or not. If you can’t then you need some sort of gpu parallelism, and you need some sort of communication between the different gpus. But maybe that messaging is small enough that it doesn’t majorly slow down inference. I’m not sure.