Hacker News new | ask | show | jobs
by ecshafer 1186 days ago
I was wondering today if we would start to see the reverse of this. Small ASICS or some kind of optimized for LLM Gpu for desktop / or maybe even laptops of mobile. It is evident I think that LLM are here to stay and will be a major part of computing for a while. Getting this local, so we aren't reliant on clouds would be a huge boon for personal computing. Even if its a "worse" experience, being able to load up an LLM into our computer, tell it to only look at this directory and help out would be cool.
5 comments

In fact, Qualcomm has announced a "Cloud AI" PCIe card designed for inference (as opposed to training & inference) [1, 2]. It's populated with NSPs like the ones in mobile SoCs.

[1] https://www.qualcomm.com/products/technology/processors/clou...

[2] https://github.com/quic/software-kit-for-qualcomm-cloud-ai-1...

Software/hardware co-evolution. Wouldn't be the first time we went down that road to good effect.

For anything that can be run remotely, it'll always be deployed and optimized server-side first. Higher utilization means more economy.

Then trickle down to local and end user devices if it makes sense.

Apple, Intel, AMD, Qualcomm, Samsung, etc. already have "neural engines" in their SoCs. These engines continue to evolve to better support common types of models.
Why is the sentiment here so much that LLMs will somehow be decentralized and run locally at some point? Has the story of the internet so far not been that centralization has pretty much always won?
Hackers want to run LLMs locally just because. It's not a mainstream thing.
It makes business sense as well. It doesn't make much sense to build an entire company around the idea that OpenAI's APIs are always available and you won't eventually get screwed. "Be careful of basing your business on top of another" and all that yadda yadda.

If you want to build a business around LLMs, it makes a lot of sense to be able to run the core service of what you want to offer on your own infrastructure instead of rely on a 3rd party that most likely doesn't give more than 1% care about you.

Running LLMs on your own servers doesn't mean PCs which is what this thread is about. A100/H100 is fine for a business but people can't justify them for personal use.
Because that is pretty much the pendulum swinging in the IT world. Right now it is solidly in 'centralization' territory, hopefully it will go back towards decentralization again in the future. The whole PC revolution was an excellent datapoint for decentralization, now we're back to 'dumb terminals' but as local compute strengthens the things that you need a whole farm of servers for today can probably fit in your pocket tomorrow, or at the latest in a few years.
Not sure this really tracks. Local compute has always been strengthening as a steady incline. Yet we haven't really experienced any sort of pendulum shift, it's always been centralization territory.

The reasoning seems mostly obvious to me here: people do not care for the effort that decentralization requires. If given the option to run AI off some website to generate all you want, people will gladly do this over using their local hardware due to the setup required.

The unfortunate part is that it takes so much longer to create not for profit tooling that is just as easy to use, especially when the calling to turn that into your for profit business in such a lucrative field is so tempting. Just ask the people who have contributed to Blender for a decade now.

Absolutely not. Computers used to be extremely centralized and the decentralization revolution powered a ton of progress in both software development and hardware development.

You can run many AI applications locally today that would have required a massive investment in hardware not all that long ago. It's just that the bleeding edge is still in that territory. One major optimization avenue is the improvement of the models themselves, they are large because they have large numbers of parameters, but the bulk of those parameters has little to no effect on the model output and there is active research on 'model compression', which has the potential to be able to extract the working bits from a model while discarding the non-working bits without affecting the output and realize massive gains in efficiency (both in power consumption as well as for running the model).

Have a look at the kind of progress that happened in the chess world with the initial huge ML powered engines that are beaten by the kind of program that you can run on your phone nowadays.

https://en.wikipedia.org/wiki/Stockfish_(chess)

I fully expect something similar to happen to language models.

The bleeding edge will always be in that territory. It still requires a massive investment today to run AI applications locally to produce anywhere near as good results. People are spending upwards of $2000 for a GPU just to get decent results when it comes to image generation, many forgoing this entirely and just giving Google a monthly fee to use their hardware.

Which is the point, decentralization will always be playing catch up here unless something really interesting happens. It has absolutely nothing to do with local compute power, that has always been on an incline. We just get fed scraps down the line.

Todays scraps are yesterdays state-of-the-art, and that's very logical and applies to far more than just AI applications. It's the way research and development result in products and the subsequent optimization. This has been true since the dawn of time in one form or another. At some point stone tools were high tech and next to affordably. Then it was bronze, then iron, and at some point we finally hit steam power. From there to the industrial revolution was a relatively short span and from there to electricity, electronics, solid state, computers, personal computers, mobile phones, smartphones and so on in ever decreasing steps.

If anything the steps are now so closely following each other that we have far more trouble tracking the societal changes and dealing with them than that we have a problem with the lag between technological advancement and its eventual commoditization.

Nvidia's business model encourages this for starters. They charge a huge markup for their datacenter GPU's through some clever licensing restrictions. So it is cheaper per FLOP to run inference on a personal device.

Centralization of compute has not always won (even if that compute is mostly controlled by a single company). The failure of cloud gaming vs consoles, and the success of Apple (which is very centralized but pushes a lot of ML compute out to the edge) for example.

I think the sentiment is both. There will be advanced centralized LLM's and people want the option to have a personal one (or two). There needn't be a single solution.
Sure, for big business, but torrents are still alive and well.
I think it's because it feels more similar to Google Stadia than to Facebook.
A couple of the big players are already looking at developing their own chips.
Have been for years. Maybe lots of years. It's expensive to have a go (many engineers plus cost of making the things) and it's difficult to beat the established players unless you see something they're doing wrong or your particular niche really cares about something the off the shelf hardware doesn't.