|
|
|
|
|
by anthonix1
627 days ago
|
|
Any direct comparisons to 8xH100? 2 toks/sec seems very slow! I haven't done any LoRA training on MI300x myself, but I have done LLama 3.1 full training on 8xMI300x and got pretty close to 8xH100 performance with my own kernels (ROCm is just too slow). |
|
My train step was taking 30s.
And I was using a batch size of 16 and seq length of 64, making the training speed as (16*64/30) tokens per sec == 35 tokens per second (for fine-tuning in JAX eager mode).
(I haven't done comparison with 8XH100)