| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zozbot234 36 days ago
	But that's why you shouldn't expect local models to provide quick real-time answers, at least not with the same smarts as SOTA models running in the cloud. Slow batched inference (if possible - RAM capacity can obviously be a challenge with typical models and end-user hardware) can be a lot more effective.

1 comments

fleventynine 36 days ago

My point is that it is WAY more efficient if we put the world's DRAM supply into a shared inference pool instead of stranding it in local machines where it won't have as high of batch size or utilization.

The cost of not being efficient is even higher DRAM costs than we have now, given supply and demand.

link

zozbot234 36 days ago

Much of the world's DRAM stock is sitting idle in consumers' local machines and on-prem servers. If that DRAM gets some use, even "inefficiently", that's a meaningful decrease in demand.

link

fleventynine 36 days ago

That DRAM would get even more use if it was removed from these machines and placed into a shared pool :) I joke, but thanks to the brutal DRAM market there has been some movement in this direction lately...

link

CamperBob2 35 days ago

I think the question of who controls the model is far more pressing than the question of who owns the DRAM.

It's easy to rattle off a half-dozen different vectors of likely enshittification over the next few years -- ranging from increasing censorship, to lower rate limits, to removal of existing features and forced addition of unwelcome new ones, to extortionate price increases, to unexplained and irreversible account bans. The only way to avoid them all is by running weights you own on hardware you control.

How smart and how fast is your local model? Those are certainly important questions, but "Does it exist at all?" is more important.

link

fleventynine 35 days ago

There isn't enough hardware in the world for everyone to run their own SoTA model. The only hope we have is if we work together to host these on shared infrastructure, benefiting from >50x economies scale due to batching, etc. That infrastructure doesn't have to be owned by greedy corporations.

link