| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lumost 1 hour ago

The big question for local LLMs is whether there is a 100 tok/s model which requires less than 16 GB of memory and is competitive on most tasks with the cloud models.

There is some signal that this is possible through both hardware innovation and training/data improvements.

Cloud models have their own constraints - I can’t have opus4.8 spend 4 hours on a deep research question I had in the shower without spending money. I can’t do real time video game upscaling and graphics work in the cloud period.

A laptop is about an order of magnitude cheaper than a cloud server thanks to economies of scale, uptime requirements, and other factors.