|
|
|
|
|
by neilmovva
936 days ago
|
|
I agree that synchronization causes overhead, so 2x GPUs won't achieve the ideal 0.5x total runtime. But here, taking your Alpaca benchmark as an example, we are seeing 2x GPUs get 3.6x runtime with Huggingface, or 1.15x with Unsloth Max. In other words, every benchmark, in either HF or Unsloth, is slower in absolute terms when going from 1 to 2 GPUs. That makes me think something is wrong with the test. Could you share your benchmark code? |
|