| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pbnjay 726 days ago

Yeah... They are using a single-core 13W measurement to project out. For a 64x parallelization - no mention of any overhead due to parallelization or power needs of the supporting hardware. This is a key quote for me (page 12 of the PDF):

> The 1.3B parameter model, where L = 24 and d = 2048, has a projected runtime of 42ms, and a throughput of 23.8 tokens per second.

e.g. 64 x 13.67W = 874 Watts to run a 1.3B model at 23.8 t/s... I'm pretty sure my phone can do way better than that! Even half that power given their assertions in the table are still overpowered for such a small model.

1 comments

shivaluminaire 720 days ago

When you multiply by 64 you also get 64 times more tokens per second!! Your math is wrong.

link

pbnjay 711 days ago

That's their math, the 23.8t/s is already the 64x but they didn't 64x the other stats.

link

shivaluminaire 720 days ago

When you multiply by 64 you also get 64 times more tokens per second!! Your math is wrong.

link