Hacker News new | ask | show | jobs
by littlestymaar 200 days ago
> The best way to drive inference cost down right now is to use TPUs

TPUs are cool, but the best leverage remains to reduce your (active) parameters count.