Hacker News new | ask | show | jobs
by brandall10 916 days ago
I think we're at least a couple generations away where this is feasible for these models, unless say it's for performing a limited number of background tasks at fairly slow inference speed. SOC power draw limits will probably limit inference speed to about 2-5 tok/sec (lower end for Mixtral which has the processing requirements of a 14B) and would suck an iPhone Max dry in about an hour.