|
|
|
|
|
by 152334H
1210 days ago
|
|
I spoke with the authors of the paper; the leftmost points in Figure 1 were generated with batch-size 1, indicating ~1.2x and ~2x improvements in speed over DeepSpeed for 30B and 175B models respectively. For reference, this is speeding up from ~0.009tokens/s to about ~0.02tokens/s on 175B. These results are generally unimpressive, of course. Most of the improvements at that point are attributable to the authors making use of a stripped down library for autoregressive sampling. HN falling for garbage once again... |
|
It's also a neat result that fp4 quantization doesn't cause much issue even at 175b, though that kinda was to be expected.