|
|
|
|
|
by dylan522p
1223 days ago
|
|
Your math is completely wrong dude. It's 2000ms per token not for the whole query. Hardware utilization rates and MFU are not the same thing, you forgot the latter. You are pretending its perfectly parelelized on 1 GPU too. I use 8x GPU box throughput. |
|