> ? Claude, ChatGPT, etc are heinously expensive for tiny benefits lmao
Unfortunately local inference is inefficient, 100s of times more inefficient than cloud. When you answer one request at a time you still have to fetch all active weights into compute units, once every token. When you run a batch of 300, you load it once and compute 300 at a time.
Compared to cloud, local inference is less flexible. You can't scale up 5x or 20x, can't have spikes, and pay for it no matter if you use it or not. But usage factor is very low, like 5%. And to run a decent model your system costs $2000 or more.
AI boosters cling to this notion because it's the only way the massive data center buildouts make any sense at all. I guess you could say the US is winning the frontier AI race. Okay. I'm never going to grant a cloud service access to all the contents of my hard drive, that's just never going to happen, so if you expect me and a lot of people like me who feel similarly to get on this train, you better have a local, lightweight model too or we're not even having a discussion, the answer is just no.
The thing is, frontier model providers don’t take your feelings into account even a little bit. It’s totally irrelevant to the discussion about the service they can provide, because that service is predicated on access to high power GPU slices that local models can’t touch. Those providers won’t be in an existential crisis because some people choose the privacy route, it’s a cost of doing business.
Right but that service being sold is predicated on products being sold to users, yes? Or are we still pretending that the hyperscalers can just pass the same $20 billion between themselves and that's going to be a growth industry forever?
I suppose its possible that all the value to pay back the datacenter construction can be squeezed out of enterprise contracts where your employer can assent on the privacy questions, probably with some kind of complicated contract and insurance regime regulating things.
Even if so, if China is coming behind 6 months later selling laptops with hyper-efficient local models that are 80% as good as "frontier" ones, I imagine they'll get the consumer business AND a fair share of the enterprise business as IT managers look at their options during the next refresh cycle.
Given economies of scale, I think it's ultimately inevitable that the enterprise more-or-less follows the consumer on this, and the consumer is going to prefer local models. There's no ongoing cost after the initial purchase, and your data at least nominally stays within your control.
I'm inclined to agree. The business world itself, and frankly you could make this argument about the entire AI industry as it were, runs on "fine" and 80% capable can probably get you there. And it's arguably even better for the hyperscalers since then, ongoing costs of AI users are basically nil. You can still have your massive datacenters and just keep them for tasks complex enough that they're actually worth spooling up.
Like I don't need an H100 or a dozen to summarize a PDF. And that's most of what I use AI for.
If we are betting on which is an easier sale, $20-100 a month w/tech support included vs $5k-10k and a requirement for moderate technical ability, I would invest in the former not the latter being the proposition that drives the conversation about AI use.
> ? Claude, ChatGPT, etc are heinously expensive for tiny benefits lmao. Local + efficient is clearly the future
Corporate America is where the money is, and corporate America will dictate what products are successful by virtue of spend. Individuals aren't going to be paying $100s or $1000s/month en masse for these models but businesses will be. Being local and efficient isn't that important at this stage but even so as American companies continue to scale and invest they'll be able to make those models more local and efficient if the market wants it. Sort of like how you had a big, giant desktop computer and now you've got a super computer in your phone which is in your pocket. Going straight to "local and efficient" means going straight to being behind because at some point, perhaps now even, the local and efficient model won't be able to keep up.
For some reason people think that they somehow know something that Google or Nvidia or whoever, with hundreds of billions of dollars of real money at stake don't already know and it's both amusing and bizarre to see this play out again and again in off-hand comments like "lol tiny benefits".
You buy an iPhone even though the cheap-o Wal-Mart Android phone for $100 "does the same thing". Except that in this case the Android phone just puts you out of business while those spending big money for "tiny benefits" beat you in the market.
> You buy an iPhone even though the cheap-o Wal-Mart Android phone for $100 "does the same thing".
People buy iPhones because of status signalling and network effects, neither of which appears to apply to AI model choice. LLMs are already rapidly on the way to being interchangeable commodities.
No they don't, it's not 2008. Anybody off the street can get an iPhone or a free iPhone with a mobile plan. They're commodity products. Even homeless people have them.
To the extent LLMs are commodity products you're right (so far), but that is limited to the main model providers, such as ChatGPT, Claude, Gemini, &c. with interoperability on cloud platform providers and other technology providers like an Apple offering you a choice of LLM with Siri or something.
If you want to suggest that some other model is in the same bucket as those primary 3, it goes back to the crappy, cheap phone analogy which is accurate. Yea you can make calls with it, but you make calls better with an iPhone.
Ok just remove "crappy" then and replace with low-quality. We can differentiate on low-quality, high-quality, and more when talking about consumer products.
I'm going off of rough memory here, but don't like half of all Americans have an iPhone? Do half of all Americans own a Porsche?
Fine, if you don't like the iPhone analogy then look at Coca-Cola vs store brand Super Cola. They are both brown sugar water, but Coke is Coke. People buy Coke because of the brand, the image and (maybe a little bit) because of the taste.
There is no equivalent in LLM land. AI models are not like Coke vs Other Cola. AI is like electricity or water, a generic commodity with minimal cost or friction to switching. I can flip the VSCode plugin between a dozen different models a day.
They run various schemes like this all the time, you can also trade in your existing phone a lot of times for pretty favorable terms. I've traded in phones that were a few years old and gotten $1000+ for them, especially when switching providers.
Verizon's "free" iPhone deal is you pay for the phone up front and then receive a bill credit. Here's the fine print from one of those deals:
$729.99 purchase on device payment or at retail price required. New line req'd. Unlimited Welcome, Unlimited Plus or Unlimited Ultimate plans required. Less $730 promo credit applied to account over 36 mos; promo credit ends if eligibility requirements are no longer met; 0% APR.Taxes & fees may apply. Credits will appear on your Verizon Wireless bill.
I don't think that's a particularly bold claim after thirty straight years of moving supply chains overseas. Capital is, inherently, the means of production. The world where we could compete is gone.
Capital is not the means of production. Capital is capital. If I have a few million dollars in my bank account I don’t all of a sudden have a factory. Remember from economics class you need capital, labor, and the means of production.
Capital inflows are different from manufacturing outflows. The US has historically imported capital which is part of why we have such a large trade imbalance. I’d encourage you to do some more digging here.
> The world where we could compete is gone.
Sigh no that’s just not true at all. We compete hard and fast all day everyday, economy is growing and will continue to do so, and no amount of leftist doomer, Chinese, Iranian, or Russian propaganda changes those facts.
> I have a few million dollars in my bank account I don’t all of a sudden have a factory.
No but money only has value because of a product of the human labor and production capacity it refers to. Money is not capital, it is a reference to/legal coercion of capital
> We compete hard and fast all day everyday
Sir have you ever been to the us? Lmao. We are only competitive in the industry of white collar work (financial/artisanal services), an industry that capital is actively gutting
Unfortunately local inference is inefficient, 100s of times more inefficient than cloud. When you answer one request at a time you still have to fetch all active weights into compute units, once every token. When you run a batch of 300, you load it once and compute 300 at a time.
Compared to cloud, local inference is less flexible. You can't scale up 5x or 20x, can't have spikes, and pay for it no matter if you use it or not. But usage factor is very low, like 5%. And to run a decent model your system costs $2000 or more.