|
|
|
|
|
by pbnjay
726 days ago
|
|
Yeah... They are using a single-core 13W measurement to project out. For a 64x parallelization - no mention of any overhead due to parallelization or power needs of the supporting hardware. This is a key quote for me (page 12 of the PDF): > The 1.3B parameter model, where L = 24 and d = 2048, has a projected runtime of 42ms, and a throughput of 23.8 tokens per second. e.g. 64 x 13.67W = 874 Watts to run a 1.3B model at 23.8 t/s... I'm pretty sure my phone can do way better than that! Even half that power given their assertions in the table are still overpowered for such a small model. |
|