|
|
|
|
|
by rft
17 days ago
|
|
> The MI100 is roughly double the performance on Qwen 3.5 35B A3B Q5_K_M to the R9700 (462 token/s prefill vs 239 tokens/s, 217 tokens/s vs 118 token/s for inference) Those prefill numbers look really low to me. I can run nearly that same model (qwen 3.6) at q4km with q6 cache on a single 3090 and get 2.3k-4.4k prefill and 100-170 generation. Just based on raw numbers I would expect the R9700 to land around 70-90 generation (about 2/3 of memory bandwidth of a 3090) and at least the same or higher prefill (nearly 3x FP16 TOPS on the R9700). That means the numbers really don't add up. Is the benchmark done with some special settings, e.g. parallel requests or with very low prompt length? |
|