Hacker News new | ask | show | jobs
by zozbot234 66 days ago
Local open inference can address hardware scarcity by repurposing the existing hardware that users need anyway for their other purposes. But since that hardware is a lot weaker than a proper datacenter setup, it will mostly be useful for running non-time-critical inference as a batch task.

Many users will also seek to go local as insurance against rug pulls from the proprietary models side (We're not quite sure if the third-party inference market will grow enough to provide robust competition), but ultimately if you want to make good utilization of your hardware as a single user you'll also be pushed towards mostly running long batch tasks, not realtime chat (except tiny models) or human-assisted coding.