| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TheJCDenton 37 days ago

For the mainstream audience, the sentiment around local ai today is the same that they had around open source a few decades ago. For a few products, some paid solutions were so much more advanced that open source were very often completely overlooked. Why bother ? And the like. Then we had captive SaaS and other plateforms and now it's obviously wrong for most of us.

The dependency we have with anthropic and openai for coding for instance is insane. Most accept it because either they don't care, or they just hope chinese will never stop open weights. The business model of open weights is very new, include some power play between countries and labs, and move an absurd amount of money without any concrete oversight from most people.

It's a very dangerous gamble. Today incredible value is available for nearly everyone. But it may stop without any warning, for reason outside our control.

10 comments

oytis 37 days ago

What is the business model of open weight AI? I don't think there is any. At best it can serve as an advertisement for the more advanced models you sell.

The huge difference to open source is that you can't just train an LLM with free time and motivation. You need lots of data and a lot of compute.

I sure want to be wrong on that, I definitely like the open-weight version of the future more

wood_spirit 37 days ago

Meta released Llama just when OpenAI was so hot and its valuation was going through the roof. Speculating, but Meta probably thought the model not competitive enough to keep as a secret weapon but well good enough to commercially damage OpenAI who were a sudden competitor for most-valued-company?

In the same way you can imagine the Chinese government pushing the release of deepseek etc to make sure no one thinks the US has “won” and to keep everyone aware that a foreign model might leapfrog in the short term future etc.

At some point though if OpenAI/Antropic/Google plateau or go bust then the open source sponsorship becomes less likely, as making it open source was a weapon not a principle.

2ndorderthought 37 days ago

I disagree. I think deepseek, qwen, and kimi earn a lot of trust open sourcing their models. While still profiting.

Effectively they are saying "yea don't crowd our data centers with small queries, go ahead and send your frontier questions to our frontier models. Oh btw those us models? You can run something about as good for free from us if you want hah." It's a power and marketing move. It's also insanely smart to keep up with it to remain sustainable as a brand. Especially given how small their investments into this are.

Look at anthropics growing pains. Deepseek has other hosts spreading their brand for free while they grow. Brilliant honestly. In my opinion it makes anthropic and openai look clueless on a lot of levels.

China is playing a different game here. To them this is commoditizing their compliment and building good will. The Chinese economy doesn't teter on the brink of collapse to deliver frontier grade LLMs. Nope, Alibaba just made qwen because it needs it. It needs efficient models. Similarly, in China they manufacture and automate so much more than the US ever could. LLMs to them are a topping not the whole meal like they are in the us.

mystraline 37 days ago

Thats because the USA has really nothing big to export. Yay, designs.

China? Im getting ready to watch the URKL (universal robot knockout league) go on. The USA is dicking around with failed robot dogs.

The USA has been a failed country, coasting on massive inertia. But the tech avenues from a article I cant find showed the USA 8/64 areas excelling. China was 56/64 areas excelling.

WarmWash 37 days ago

China is an advanced 2nd world country with pockets of first world.

Smart people in China design fast manufacturing lines for $25k/yr.

Smart people in the US design bond hedging strategies or ad-pixel trackers for $250k/yr.

China is in the stage the US was in 60 years ago, and eventually those high paying, high impact jobs will suck the intelligence out of all the "blue collar" work. Just like it did in the US.

2ndorderthought 37 days ago

I believe it. The us intentionally lacks accountability to prop up the already wealthy in almost all of its ventures. Which socializes losses and capitalizes gains. It's an economic model that guarantees deterioration and stagnation.

Dodging politics, the power structures in us industry need serious revamping.

mrleinad 37 days ago

China is going to be the next Germany: a loser in the new world without globalization

watwut 36 days ago

> Thats because the USA has really nothing big to export. Yay, designs.

USA exports and exported services, especially in IT. And a lot. USA has nothing to export is true only if you intentionally ignore stuff USA exports.

sillysaurusx 37 days ago

If this is true, then why are most of the companies that change the world founded in the US?

WarmWash 37 days ago

The Chinese labs don't have to make money or be profitable. They are funded by the state to achieve the state's goals, and the global praise of their open models just serves as Chinese soft power.

They're state companies, not some kind of ethical VC charity fund project.

2ndorderthought 37 days ago

The fun part is, they are making money and have way less to pay off despite 100s of billions in donations than the US companies do.

Spooky23 37 days ago

Is it so different?

If the US’s fascist experiment continues past the current president, we’ll absolutely be nationalizing frontier companies or exerting equivalent control.

treis 37 days ago

Yes, China is very different from the US.

try-working 37 days ago

Correct. Open source is a PR and marketing strategy for new labs, regardless of origin.

https://try.works/#why-chinese-ai-labs-went-open-and-will-re...

D2OQZG8l5BI1S06 37 days ago

Interesting article, but Qwen does seem to be closing off. They don't release big variants anymore, and I'm not sure that the fact the local-LLM community keeps praising it actually increases the number of people using their API.

It did work for Deepseek for sure and it seems to move the needle for Xiaomi's MiMo; but will it be enough for Qwen and Gemma? Those are the models you can actually run without going all-in on AI (but only with gaming GPUs and such).

try-working 37 days ago

Definitely. Open releases will accelerate this year, including from Qwen because they're behind in adoption.

HDBaseT 37 days ago

You can still make money on open weight models.

The compute required to run these models is still very far out of reach for the average consumer, yet known enthusiast, therefore they still sell inference, whilst also getting consumer goodwill for providing open weights.

datadrivenangel 37 days ago

And the efficiency! Big accelerator cards are ~100x the throughput per watt in terms of raw processing power.

js8 37 days ago

What is the business model of Wikipedia? I don't think there is any.

Not everything good in our society needs to have a "business model". People still work on it. It's FINE.

sroussey 37 days ago

> What is the business model of Wikipedia?

Donations. Have you donated lately?

Wikipedia is cheap compared to creating and training models.

I don’t think donations will suffice at all.

As an example, we had millions of web developers download and install Firebug before browsers shipped their own dev tools. Donations over the course of multiple years would have paid my salary for a month if I were not a volunteer.

But from the “it’s fine” point of view, models will be baked into your OS.

Then later models will be embedded into hardware. Likely only OS makers models.

selcuka 37 days ago

> Wikipedia is cheap compared to creating and training models.

DeepSeek said it spent $5.6M [1] on training V3, which doesn't sound too much for a near-SOTA model.

An open source entity can come up with a hybrid business model, such as requiring a small fee from those who want to host the model as a business for the first n months following the release of a new model, but making it fully free for individuals.

[1] https://arxiv.org/pdf/2412.19437

avidphantasm 37 days ago

Ultimately, information is a public good: it is non-excludable (you can’t stop people from using it) and it is non-rival (we can all use it at the same time). Public goods are often very useful, and because they are non-excludable and non-rival, ultimately can’t have a market-based business model. I would class open-weights AI models as public goods, and would support government expenditure to produce them.

phainopepla2 37 days ago

Training AI models is capital intensive, though. Unless there's some sort of mega-crowdfunding effort for open weight model training there needs to be a way to recoup that money on the other end. Either that or state sponsorship I guess

try-working 37 days ago

Open sourcing models is a marketing strategy. Chinese labs and small international labs have no awareness or distribution, so unless they become a hot topic for a while, nobody is going to bother trying out their models. Open source gets them that, and is essentially a tax on newcomers. When you start out you simply have no other option but to open source your models.

So, the business model of open models is the same as closed models: Sell inference. Open source is marketing for that inference.

https://try.works/#why-chinese-ai-labs-went-open-and-will-re...

pabs3 37 days ago

None of these models are open source, they are just public weights, with licensing that sometimes but usually doesn't meet the Open Source Definition.

The Open Source AI Definition (OSAID) is quite ridiculous, I prefer the Debian ML policy for defining freedoms around AI.

https://salsa.debian.org/deeplearning-team/ml-policy/

kranke155 37 days ago

China’s long term goal might just be to own the chip layer alongside everything else, and outproduce the US in data centers.

Frontier US labs could still have an advantage for a long time, but many use cases would start gravitating towards Chinese models if they 10x the data centers and provide similar quality inference for a third of the cost.

PAndreew 37 days ago

Perhaps you can create a compelling UX around it and sell it as a subscription. "Normies" will not be able/willing to build it. You can then patch the model/ship new features around it as it evolves. For example I have built an ambient todo list / health data extractor using Gemma 4 2EB and Whisper. Nothing to brag about but it does fairly decent job even in foreign languages.

karussell 37 days ago

> What is the business model of open weight AI?

This is what I do not understand as well and advertising the knowledge and more advanced model is also the only thing that comes to my mind.

Since a month I am using gemma4 locally successfully on a MBP M2 for many search queries (wikipedia style questions) and it is really good, fast enough (30-40t/s) and feels nice as it keeps these queries private. But I don't understand why Google does this and so I think "we" need to find a better solution where the entire pipeline is open and the compute somehow crowdfunded. Because there will be a time when these local models will get more closed like Android is closing down. One restriction they might enforce in the future could be that they cripple the models down for "sensitive" topics like cybersecurity or health topics. Or the government could even feel the need to force them to do so.

2ndorderthought 37 days ago

Why would you want to try to support all users simple queries on your ai data center if they could run it on their own computer?

It builds good will also. it also shows research prowess.

For China it's different. They need to show Americans who don't trust them at all because of propaganda that they have no tricks up their sleeve. It also doesn't hurt when Chinese companies drop models for free people can run at home that are about as good as sonnet. Serious mic drop.

TheJCDenton 37 days ago

Very good point on using local ai to avoid data centers costs.

Running AI models on local hardware was exploratory at first, and if it's so easy today it's thanks to open source. It's a little bit coincidental that we have this today, and that mainstream hardware have this capability. The fact that a phone can run very small models is exploratory or some kind of marketing opportunity at best.

Why would hardware company ships cards with more AI capabilites (like more VRAM) in the foreseable future ? On what ground does the marketing for on device AI will keep generating interest ? For something as important, it's very uncertain. But above all, it should not depends on these brittle justifications.

Showing good will in distribution and research prowess today is positive communication, but it can be exactly the oppositite if/when an attack using those small models will reach a high value target.

For China the cultural difference is so huge, it's difficult to say. I would think they first and foremost need to show to evryone inside and outside of China that they match american models. Second, i would say that when americans prefer few very powerfull companies on the get go because they can leverage a lot of capital rapidly to industrialize, China will prefer leveraging a lot of smaller companies exploring a lot of things simultanously (so doing a lot of research), THEN creating legislation to let only the best (or a few) to survive effectively. In the end it's the same result (monopoly or oligopoly), but China may have a stronger core (research) and America may have stronger productive capital, that may be proved obsolete... In the long run, in either side it's a gamble, again.

2ndorderthought 37 days ago

They have already shown that their models match or excel over American ones in different cases. For cheaper too.

I disagree on the second point. I think most Americans don't prefer fewer competition, that's a bit antithetical to the free market.

I doubt the Chinese government cares as much about controlling a few companies as you think they do.

China has a few things going for it beyond research. They are mission driven, they actually have needs for this technology, their needs will forward their entire economy as they are the world's largest manufacturers. They are also huge exporters and have buckets of customer support for various languages.

China also has considerably stronger infrastructure for electricity, etc. even with an nividia embargo they are doing more than showing up.

I don't think it's a matter of who "wins". There is no winning. I think China stands to gain far more from LLMs than the US does, and they have proven they don't need the us to do it, even with he us trying to sabotage it's every move into the space. The game is already more or less over in my mind.

If anything I see LLMs as having a huge market in China, and now the US can't even sell it to them.

All I care about is, if I have to use this technology, let me run it locally to avoid the surveillance capitalism aspect. That seems to be the real reason the us has propped up it economy in anticipation for this technology. Yet it doesn't long term benefit the us nor me.

codebje 37 days ago

I'd expect unified memory architectures (Apple M-series, AMD Ryzen AI series, etc) to be the future of local inference, not GPU cards.

2ndorderthought 37 days ago

Time will tell. Depends on small model architecture trends and hardware availability. I wouldn't be surprised if something came slightly out of left field. Considering Taiwan is trapped into producing the same chips for the next 2 years, I wouldn't be surprised if a new player emerged.

karussell 37 days ago

Indeed cost can be another factor. Maybe also the main reason why Chrome added an offline model.

2ndorderthought 37 days ago

That and it's lucrative for Android/chrome to have a text summarizer model embedded on your phone probably for government contracts and data exfil but we won't go through there.

majormajor 37 days ago

> What is the business model of open weight AI? I don't think there is any. At best it can serve as an advertisement for the more advanced models you sell.

I don't think local will necessarily be open-weight. And then it's not that different from personal computing: you're giving up the big lucrative corporate mainframe, thin-client model for "sell copies to a ton of individuals."

So it'd be someone else (an Apple, or the next-year equivalent of 1976 Apple) who'd start eating into that. There are a few on-device things today, but not for much heavy lifting. At first it's a toy, could maybe become more realized in a still-toy-like basis like a fully-local Alexa; in the future it grows until it eats 80-90% of the OpenAI/Anthropic use cases.

Incumbents would always rather you pay a subscription or per-use forever, but if the market looks big enough, someone will try to disrupt it.

treis 37 days ago

Compute has gone back and forth from mainframe/thin client to fat client a few times already. LLMs will probably follow at some point but I think it's going to take a long time.

The cost to transmit text is basically free and instantaneous. The rent (i.e. a GPU in a data center) vs buy is going to favor rent until buy is a trivial expense. Like 50-100 range.

Even then a LLM that just works is easier than dealing with your own

majormajor 37 days ago

Storage has moved back and forth but I don't thnk compute has ever really gone back to thin client. Even Gmail, Google Docs, etc are running a buttload of javascript on the user device. Various attempts at avoiding that (remote .NET or JVM stuff on early "smart-ish" phones) crashed and burned.

Video game streaming is the closest thing, and it's never really taken off. (And this, IMO, is a good comparison because it's a pretty similar magnitude up-front-cost, $500-$4000.)

Once the local-AI-is-good-enough (Sonnet level for a lot of basic tasks, say) for a $1k up-front investment the appeal of having something that can chew on various tasks 24/7 w/o rate limits, API token budget charge concerns, etc, is going to unlock a lot of new approaches to problems. Essentially more fully-baked line-of-business OpenClaw-type things. Or the smart home automation bot of Siri's dreams. You can more easily make that all private and secure when all the compute is local: don't give any outside network access. Push data into the sandbox periodically via boring old scripts-on-cronjobs, vs giving any sort of "agentic" harness external access. Have extremely limited data structures for getting output/instructions back out. I'd never want to pass info about my personal finances into a third party remote model; but I'd let a local one crunch numbers on it.

Even if you need Opus/Mythos/whatever level for certain tasks, if 95% of everything else you'd pay Anthropic or OpenAI for can now be done on things you own w/o third party risk... what does that do to the investment appeal of building better AI appliances to sell end users vs building better centralized models?

I think "what if today's LLM performance, but running entirely under your control and your own hardware" opens up a LOT of interesting functionality. Crowdsource the whole world's creativity to figure out what to do with it, vs waiting for product managers and engineers at 3 individual companies to release features.

treis 37 days ago

There was a time where people ran software on their computer with limited connectivity. Late 90s/early 2000s most of what you did was running locally on your machine. Your emails would be downloaded and there'd be a shared drive but otherwise all local.

Anyways, who's spending $1k for a LLM machine when they can spend $20 (or 0) on a subscription? And who's having an LLM crunching away 24/7 anyways? Anyone who is going to do something like that probably wants a cutting edge model.

It'll (probably) get to a point where the hardware is cheap enough and advancement levels off. But we're a ways from that and even then when a data center is 20ms away why not offload heavy compute that's mostly text in text out.

zozbot234 37 days ago

Except that buy is a trivial expense because the hardware has been bought already. You've got a whole lot of iGPU and dGPU silicon that's currently sitting idle as part of consumer devices and could be working on local AI inference under the end user's control.

worldsayshi 37 days ago

It should be feasible to crowd fund training runs right?

dmd 37 days ago

A training run costs somewhere in the neighborhood of a billion dollars. That’s a thousand millions.

How many crowdfunded projects do you know that have raised even one percent of that? Who’s going to be in charge of collecting that scale of money? Perhaps some sort of company formed for the benefit of humanity, which will promise to be a non-profit? Some sort of “Open” AI?

Oh, wait.

derektank 37 days ago

It’s well within the capabilities of governments in developed countries. If Mistral did not already exist, I would definitely expect the French government to invest in a national LLM, if only because of how defensive they are of the French language.

iugtmkbdfil834 37 days ago

<< That’s a thousand millions.

I can't say that you are lying and you are not exactly exaggerating either. It is true that a new SOTA model -- from literal scratch -- it would be expensive.

But, and it is not a small but, is the starting point really zero?

jononor 36 days ago

Hardware sales would be an excellent business model for open weights. Nvidia is already on it with their Nemotron models. Any new LLM/NPU hardware companies would want to so the same, if noone else does it for them (Chinese labs currently do).

Selling managed self-hosting solutions would be another. That is the business of that recent American company.

Selling fine-tuning services or similar adaptations is another. That is what Unsloth is going for, I believe.

Most likely any sound business strategy is going to be of "commoditize your compliments" type. There are many complementary products to open-weight - some probably not invented/discovered yet.

fragmede 37 days ago

The business model is the total lack of attention to Qwen and Kimi that would happen if their models weren't downloadable. Before releasing the weights, there was basically zero attention paid in the western hemisphere to them, for whatever reason. By releasing the weights, they're relevant in the western world. The business model is to get people in the West to pay to use their platform hosting their AI, that otherwise would never have heard of them. As you said, advertising/marketing, essentially.

codebje 37 days ago

Baidu have a lot of services I've never heard of, that are highly successful in China. The lack of interest in expanding into Western audiences doesn't seem to matter there - what's different about inference?

fragmede 35 days ago

Looking at Temu and Shien, there's a ton of interest in expanding into western audiences, the difference with inference is that they've found a way to make that happen. Vs, I don't have any use for, eg Baidu's equivalent to Google maps because I have, well, Google maps.

sumeno 37 days ago

If a local model hits critical mass the business model is to use it to shape opinions in a way that is advantageous for the company/owners.

Much like the current Twitter model, being able to put your thumb on the scale of "truth". Bake a stronger bias towards their preferred narrative directly into the model. Could be as "benign" as training it to prefer Azure over AWS. Could be much worse.

dleslie 37 days ago

This is where government funding can play a role.

Sometimes there are things where the public good is best served with public expenditure.

CamperBob2 37 days ago

"Government funding" these days would mean that Trump pays Elon Musk (or more likely vice versa) to make Grok 4.20 the only legal LLM for use by Americans.

dleslie 37 days ago

Outside of the USA it would not look like a wealth transfer to an oligarch.

Not every country is in a crypto-libertarian race to hoard power and wealth.

CamperBob2 37 days ago

Not every country is in a crypto-libertarian race to hoard power and wealth.

Meanwhile, in the EU, the model would be collectively financed, trained by a competent, neutral agency... and then completely lobotomized in the name of "the children," "safety," "IP rights," "correct speech," dozens of individual countries' legal and regulatory requirements, and any number of additional vocal, noncontributing NGOs.

So no one would get rich off of the public model, but no one would get much of anything else out of it, either.

As another reply suggests, there's a reason why things happen in the USA first. Even when they don't, the prime movers move here as soon as they can. Or at least they used to.

dleslie 36 days ago

European models are competitive, despite the concerns you raise.

I don't need a model that can easily produce CSAM or reproduce copyrighted works verbatim in order to be productive.

thefounder 37 days ago

Cloud providers have incentives to release open source models but for some reasons this happens only in China. Amazon, Azure, Google benefit from open source models because people run them on their hardware.

apublicfrog 37 days ago

> It's a very dangerous gamble. Today incredible value is available for nearly everyone. But it may stop without any warning, for reason outside our control.

What stops you from running the best open weighted LLMs currently available on consumer grade hardware for the rest of time? They're good enough for 95% of use cases, and they don't have a used by date. From what I can see, the "danger" is not having the next tier that comes out, but the impact of that is very low.

giobox 37 days ago

> they don't have a used by date

For quite a lot of use cases, the current systems arguably do get worse over time if not continually updated. The knowledge cutoff date will start to hurt more and more as the weights age in a hypothetical scenario where you are stuck with them forever.

Coding, one of the most popular usescases today, would not be great if it say only understood java to a version from years ago etc.

https://en.wikipedia.org/wiki/Knowledge_cutoff

mrtesthah 37 days ago

>Coding, one of the most popular uses cases today, would not be great if it say only understood java to a version from years ago etc.

This LLM trained only and entirely on pre-1930s texts was able to code Python programs when given only a short example:

https://talkie-lm.com/introducing-talkie

throwyawayyyy 37 days ago

One solution is not to advance anything of course. I'm not even joking, is there going to be a successor to React? I suspect not, with the vast amount of training data for React now, it's going to look silly to move to something else with less support. What is the last new popular programming language, rust? Will there be another one? I suspect not. Same reasoning. The irony of all this AI acceleration talk is it'll work best if we don't accelerate the underlying tech at all.

WarmWash 37 days ago

There probably won't be new stuff so much as trends in how stuff is done, and updates around optimizing those trends.

jvm___ 37 days ago

Will programming languages evolve into less human oriented written code and more just calls to a trusted AI.

Or will human readable code be less and less of a thing as AI learns it's own, more terse language to talk to other AI's.

digitaltrees 37 days ago

Yes. I am seeing a big push to use vanilla js for single file html apps that are easy to build, deploy and distribute because they have no build step. I could see component libraries emerging that make it easier build from chat interfaces with less ceremony

byzantinegene 37 days ago

i'm not sure the tradeoff in code readability is worth it as of now.

Spooky23 37 days ago

Alot of the language work is scratching the itch of engineers and developers. I think you’re correct and react is the new COBOL.

hadlock 37 days ago

Name/post content combo on point

apsurd 37 days ago

Humans are notoriously bad at predicting the future. Toward that end, your prediction is laughable. React is the end all be all of UI… lol

melagonster 37 days ago

Programmers won't be allow to exist in future. Vibe coding is the final resolution people can apply.

nullc 37 days ago

Small models are more useful for "doing stuff" than "knowing stuff" to begin with. Add in an agentic harness and a small model can happily read more current information on demand (including from e.g. a local wikipedia snapshot).

henry_kang 34 days ago

This feels increasingly true.

A lot of useful AI work is shifting from “knowing more” to “working with more context”, files, recordings, repos, screenshots, browsing history, etc.

Once that happens, memory and orchestration start mattering much more than raw model size.

rrvsh 37 days ago

Nobody is unaware of the knowledge cutoff, and sharing the Wikipedia article is not helping anyone. Your point is easily rebutted by taking whatever open weights/source model has an outdated cutoff and training or fine tuning it on more data, which is again always going to be viable given a modicum of compute

tcp_handshaker 37 days ago

You could learn how to code...a whole generation did it before...

AlienRobot 36 days ago

I genuinely don't understand how can this possibly be a problem long term.

It feels very obvious that the solution is to have a smaller model that can be trained exclusively on Java information to augment the older model. If the architecture doesn't support it currently, then that's what the architecture will look like in the future.

Otherwise you'd be arguing that, to serve users who want to an up-to-date LLM on topic X, you have to train the model on the entire ABC all over again.

It's simply ludicrous to have a coding LLM that needs to be retrained on the latest published poems and pastry recipes to generate Java.

lowbloodsugar 36 days ago

Laughs in JDK8 code base.

moffkalast 36 days ago

Ha yes I used to think this was not a notable issue, but just today I was getting qwen 3.5 to fix my network drivers and it immediately freaked out like: "kernel 6.17, what the fuck? that doesn't exist yet!". It almost had a mental breakdown over that detail and derailed the conversation towards checking what's wrong with the kernel version reporting lol.

turtlebits 37 days ago

FOMO. A new model comes out weekly and the HN crowd debates over the minutia of changes.

Pockets are too deep, it will only change once everyone is out of money.

3eb7988a1663 37 days ago

What is really amusing to me is how N months ago, the latest SOTA was incredible, but now utterly unusable. Feels like there is a model reality-distortion field in play where people can only acknowledge the flaws in retrospect.

lxgr 37 days ago

They’re really not good enough, unless you consider 64 GB of memory or more consumer grade.

steve_adams_86 37 days ago

I’m pretty happy with what a 32GB Mac Studio can do for a lot of tasks. They’re the things I’d throw a model like Haiku at, but still genuinely useful. We don’t have an answer to frontier models in the consumer range yet, but we’re not totally trapped.

Side note though, it’s the speed that bothers me more than the reasoning. Qwen 3.5 is awesome, but my Claude subscription can tear through similar workloads an order of magnitude faster than my local LLM can when using Haiku. That’ll matter a lot to some people.

datadrivenangel 37 days ago

Yeah this is the real killer. slower and more expensive is tough.

avazhi 37 days ago

> What stops you from running the best open weighted LLMs currently available on consumer grade hardware for the rest of time?

Uh… the hardware requirements? And stop acting like some dog shit 8B model the average Joe can run on a laptop is even close to being comparable to what Claude or even Codex can currently do.

I have pretty good hardware and I’ve tinkered with the best sub-150B models you can use and they are awful compared to Anthropic/OAI/Grok.

apsurd 37 days ago

What if the harness and loops get sufficiently better though? CC is using haiku for code-base gripping and such, you don't see a local commodity model being "good enough" for the 80% case when matched with better harnesses and tool calls?

honest question, i'm very interested in this, but too casual as of now to know any better.

avazhi 35 days ago

I think the main issue is, as the other guy also alluded to, the parameter discrepancy. I know Mixture of Experts models are popular specifically becaue they save a lot of space and memory, but if your initial answer space is two orders of magnitude smaller on a local machine compared to the frontier cloud models, that knowledge gap just gets wider as the conversation continues, and the initial answer isn't even going to be as good to begin with. I don't know how to solve that parameter gap without hardware - there's only so much optimisation you can do, but at the end of the day parameterised knowledge takes up some minimum amount of bits that you can't excise without the actual knowledge and intelligence suffering.

byzantinegene 37 days ago

vast majority of average users don't use llms for coding, and for those purposes, local llms with low param count are a far cry from SOTA models.

apublicfrog 37 days ago

> And stop acting like some dog shit 8B model the average Joe can run on a laptop is even close to being comparable to what Claude or even Codex can currently do.

I'm not, you've actually illustrated my point. LLMs in 2022 were very impressive. By 2024 the general public was finding them an acceptable replacement for many research driven tasks and massive shortcuts for other tasks (coding, image work, document preperation, etc).

Those models are absolutely runnable on consumer hardware now, and we were extremely happy with the results. It's no different to how we used to think CRTs were amazing or early smartphones, but going back now they seem awful.

We're long past "danger". If what we have is the best we'll ever have open source, we're already in an excellent position.

avazhi 36 days ago

> LLMs in 2022 were very impressive.

No they weren't. They were a gimmick - it is only in the past 6 or so months that frontier models have started to do stuff beyond mere gimmicks when it comes to coding, and you could make the argument that Mythos has been the first 'Holy shit' moment that we've had that has stepped us beyond 'Yeah that's really neat but...'

> Those models are absolutely runnable on consumer hardware now,

A sub 50B model is awful and can't even write proper English sentences half the time, to say nothing of how bad its world knowledge is. Try the 32B Gemma 4 local model for a week and then go back to Claude and then get back to me.

> We're long past "danger". If what we have is the best we'll ever have open source, we're already in an excellent position.

Not sure what to tell you other than that you and I have very different standards. What we have locally right now is barely more than a glorified autocomplete, and it feels worse than using ChatGPT 2 years ago because the context window is less and it doesn't have good webhooks on consumer setups. Another thing I'd say is that you clearly have no clue what 'consumer hardware' means, or what consumers that can even get this stuff running locally would have to do to get it to even rival the frontier models in terms of their usability (most consumers are't going to just boot into Ubuntu and run this thing from a command line) flow, to say nothing of the hardware requirements. I'd love to never use Claude or Gemini or ChatGPT again for both privacy and money reasons, but the quality of outputs and depth of thinking and writing ability between even the very best local models you can run right now is many orders of magnitude less than what you get using distributed frontier models, and those 'very best' local models require a top of the line machine that 99.9999% of consumers don't have and would never consider buying. The cloud models all have like a trillion(!) parameters now. It isn't even close.

I sure hope the local side of things massively improves over the next 2-3 years, but based on how this has gone my guess is that in 3 years you'll be lucky, if you have very top of the line hardware, to get benchmark performance that we had 6 months ago with the frontier models. The distributed hardware/memory gap is just too big.

apublicfrog 34 days ago

> No they weren't. They were a gimmick - it is only in the past 6 or so months that frontier models have started to do stuff beyond mere gimmicks when it comes to coding

This is simply untrue. Using agentic orchestration I was writing production code daily 3 years ago. Hallucinations happened sometimes and context window was smaller (so you had to do some funky workarounds to deal with larger codebases), but it was workable. There have been a lot of marked improvements from a code perspective then - a lot model related yes, but also a lot in the ease of use, interfaces, etc.

> Another thing I'd say is that you clearly have no clue what 'consumer hardware' means, or what consumers that can even get this stuff running locally would have to do to get it to even rival the frontier models in terms of their usability (most consumers are't going to just boot into Ubuntu and run this thing from a command line) flow, to say nothing of the hardware requirements.

You've moved the goalposts. My point was that the "danger" of no new open models being released isn't that high as the existing ones are already impressive. Their ease of use or daily driving isn't relevant to that. If there were a need, someone could wrap a clean interface and support around it, or run it as their own cloud solution.

You seem to be arguing something adjacent to my point, which is fine I guess but I have little to say. Also multiple of your comments have come across quite aggressive and rude. Just food for thought if you want to work on that or not.

root_axis 37 days ago

> They're good enough for 95% of use cases

They're not at all, not even close. Especially when you consider the use cases for people who are paying for LLM services today.

nightski 37 days ago

Hardware. Frontier labs are driving up demand so much that it's priced significantly above cost making it far less affordable. Just look at Nvidia's profit margins.

suika 37 days ago

The use cases in the future will be nothing like the use cases from today.

apublicfrog 37 days ago

Maybe. The use cases people primarily use LLMs for (documents, coding, design, research) existed decades ago with different tooling. Who knows if the future will have a slew of new problems that require new models or will continue to be similar?

ai_fry_ur_brain 37 days ago

95% of usecases. What are you smoking.

selcuka 37 days ago

There are very good open weight models (such as DeepSeek v4 Flash) that can run on consumer level hardware.

Note that we are talking about 95% of everyone's use cases, not your specific use cases (which could require better models all the time).

irishcoffee 37 days ago

I own 2 5070TI cards in a rig I would gladly donate time to for a distributed training model effort. The kicker is the training data. I would want to gate the data to anything before 2022. I don’t know how to coordinate that, but I would really like to be involved in something like this. SETI, for LLMs.

AlexCoventry 37 days ago

Bandwidth is the killer, in distributed LLM training.

irishcoffee 37 days ago

What’s the rush?

codebje 37 days ago

It depends on the purpose for the model. AFAIK LLMs aren't particularly capable at researching answers, relying more on having 'truth' baked in to their weights, so if it takes 12 months to train up a crowd-trained LLM it'll be 12 months behind the times.

How serious a risk is poisoned weights?

Can we leverage the cryptobros into using LLM training as a proof of work?

MarsIronPI 37 days ago

What? I use Qwen 3.5 35B-A3B and it definitely knows how and when to do web searches to fill in gaps in its knowledge.

codebje 37 days ago

Does Qwen3.5 know it needs to do this because the API in question has had loads of churn and much of its training data is on obsolete versions, or do you need to prompt it? How well does it handle having an API reference with sample code in its context window?

Having an LLM use a web search tool isn't the same thing as researching a topic, IMO, because it's so ephemeral and needs constant reinforcement. LLMs aren't learning machines, they're static ones.

slicktux 37 days ago

I’m just waiting for the US Government to implement their own local AI. Which will eventually lead to them open sourcing it because it’s tax payer funded and being that the NSA has decades worth of internet data they can train on; open weights would be just as good as any companies…

riponcm 28 days ago

How does it help?

fragmede 36 days ago

with this administration?

ios-contractor 37 days ago

I don't think it should be local vs cloud AI. I think local AI should be treated as a separate product. local ai should do things that really don't need cloud AI, then cloud AI should be used as a fallback. That would reduce a lot of costs

digitaltrees 37 days ago

Exactly this. The assumption that your access will last is very risky. Or that Chinese companies will keep trying to erode the economic viability of American models by open sourcing the reversed engineered models for ever is naive.

aabhay 37 days ago

Disagree with this. When cost becomes an important factor or the free but worse option becomes compelling and accessible (i.e. on device agent via apple style UX), there has been significant user behavior towards local. Think about stuff like removing backgrounds from photos, OCR on PDFs, who uses paid services for casual usage of these things?

iLoveOncall 37 days ago

The mainstream audience does not have the faintest idea that "local AI" is even a thing.

CamperBob2 37 days ago

Just as their counterparts in 1975 had no idea that "personal computers" were even a thing.

Read through a 1970s-era issue of Popular Electronics or Byte, and then spend some time surfing /r/LocalLlama. You'll get a sense of real-time deja vu, like you're watching history unfold again.

beloch 37 days ago

Keep the Silicon Valley pattern in mind:

1. Innovate, create, and offer it all at sweetheart prices to the public while you rack up debt.

2. Shovel in more money and either buy out or outlast the competition. Become dominant. Lock in your users any which way you can.

3. Enshittify and cash in.

The deals Anthropic, OpenAI, etc. offer won't stay this good much longer. Don't let them lock you in. Failing that, you should budget more for the same service. You're going to need it. Having an open alternative running on your own hardware offers non-negligible peace of mind.

furyofantares 37 days ago

What's the gamble here exactly? What agency do we have in it right now?