Hacker News new | ask | show | jobs
by leptons 39 days ago
>running large models on shared, dedicated hosted hardware at full utilization is going to be vastly more cost-efficient for the foreseeable future.

That is only true right now because hundreds of billions of dollars are being burned by these AI companies to try to win market share. If you paid what it actually cost, your comment would likely be very different.

2 comments

No, it's economies of scale and I don't understand where anyone is coming from that thinks they'll be better off buying their own hardware, why would you get a better deal on MATMULs/watt than the cloud providers ?
Within 5-10 years you're going to see a box like one of those AMD Halo nodes running homes.

They'll be controlling lights and temperature, they'll be adding calendar reminders that show up on your phone and your fridge. Your phone and devices might sync pictures and videos there instead of the large cloud providers. They'll also be a media server, able to stream and multiplex whatever content you want through the home. They'll also be a VPN endpoint, likely your home router, maybe also a wifi access point.

I think this makes quite a bit of sense. I don't think they'll be ubiquitous, but they could be.

This distributes the power demand where local solar generation can supplement , gives the home user a lot of control, and claims overship of the user data from big tech.

Maybe I'm imagining things but this is what I think is coming.

It's the lmm/data heart of the home. A useful digital tool.

It's amazing to me. You say this like it isn't an absolute horror. We've really ramped up the malignant bloat of the software industry if it goes this way.

We'll have this massive machine to do "home automation", something that by all rights should be possible with less computing than is deployed in smartwatches today. Yuck...

Moving the LLM from SaaS to the home, reducing the power distribution problem, and giving people control back over their data - getting it away from Big Tech. The home controls should also be more responsive that most current modern home automation that mostly uses wireless and Bluetooth to a cloud service. These are all good things.

That's just one piece of the puzzle. If you're running the LLM there's no reason your family's mobile devices couldn't use said home LLM box to save battery life on their devices while maintaining control of their data, searches, photos, files, etc.

Umm, you can do basically all of this, today, with Home Assistant and a handful of add-on apps.

I use a local LLM with it, but you can use a hosted LLM if you like.

The core home automation stuff can run on a potato. The LLM just writes new automations when I ask it, or acts as a natural language interface.

I use a pretty small 4B parameter local LLM, on a fairly modest mini PC. It doesn't take a frontier model to do that kind of work.

I am 1000% aware of this, but I think we're going to see more packaged solutions in the hardware front.
Another victim of Goldratt's Theory of Constraints. Some things are more important to optimize for than MATMULs per Watt. What that is I leave as an exercise to the student. May you realize what it is before it is too late.
Some individuals will choose some $10,000 hardware so they can keep freedom and privacy and that's well and good, my point is just that freedom and privacy is not what wins marketshare, and hence, IMHO, local LLMs are not going to catch up and surpass frontier models like some in this thread like to claim
> freedom and privacy is not what wins marketshare

Digital sovereignty laws may mandate/remove access to LLMs of other countries on economic and national security grounds.

We don't know the parameters but it probably takes at least a H100 and possibly several to run a SOTA model. Given the pricing (25+k per H100 + hardware to run it) and power (700W per H100 + hardware to run it), I don't see how anyone except for a largish company can afford to run this.
Are you serious? It’s multiple nodes to run a frontier model (a node is 8x GPUs), and they aren’t running on H100s. You are looking at 32+ GPUs.
I was being pretty generous to the comment I was replying to. Needing 32+ H100s just strengthens my argument that people aren't going to run frontier models locally anytime soon.