Hacker News new | ask | show | jobs
by imtringued 804 days ago
>doesn’t actually save FLOPs (uses more)

Does anyone even care? Really, who cares? The truth is nobody cares. Saving FLOPs does nothing if you have to load the entire model anyway. Going from two flops per parameter to 0.5 or whatever might sound cool on paper but you're loading those parameters anyway and gained nothing.

1 comments

companies that run these things care - they run at huge batch size and are compute bound in the limit