|
|
|
|
|
by nextaccountic
4 days ago
|
|
How can you combine CPU cores and multiple GPU? Are you running some layers in cpu, others in gpu #1, and others in gpu #2? What about the bandwidth and latency between them? Or maybe the model itself only runs at gpus, and the cpu memory only store the weights for experts not corrently activated? If so, then what's the 32 or 64 cpu cores for? I'm a big fan of fully utilizing one's hardware and it's kinda sad that it's not the norm to run things on either gpu, cpu or both, dynamically choosing at runtime, for everyday software |
|