| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wolttam 36 days ago

By my estimation (guess) you won't actually need to spend that much because the models are already getting a point where they don't need to get a whole lot better to be extremely helpful across many domains.

And it looks like those very helpful capabilities will continue to transfer to smaller models as well, as architectures and training regimes continue to refine.

I can fairly easily imagine a world where the only people needing to spend a lot of money on models are those that are using them to solve truly novel problems. The rest of us will get plenty of use at reasonable costs for the typical day-to-day helpful stuff.

3 comments

hypercube33 36 days ago

All we need is something like Qwen3-coder-next but at Kimi K2.6 ability so it runs on laptop workstation hardware and we are set...soon?

wolttam 36 days ago

In 2023 GPT-4 was allegedly 1.8T parameters. In 2026 we have ~100x smaller models (10-20B) that handily outperform it, and can indeed run on a laptop.

WanderPanda 36 days ago

It highly depends on the task. For math and coding, sure. But for knowledge tasks GPT-4 is wayy better than even SOTA ~100B models. For my knowledge test cases the lines get blurry at >400B

rectang 36 days ago

How does "outperform" translate to the propensity of an LLM to hallucinate?

operatingthetan 36 days ago

There seems to be a mass delusion about how capable SOTA models actually are. That's my only explanation for how poorly I find them performing in basic knowledge tasks compared to how others describe their prowess.

rectang 36 days ago

I understand you to be implying that I shouldn't trust my perception that there's a meaningful difference in how much different models hallucinate. I will take that under advisement, but I am still interested in the answer to my original question.

operatingthetan 36 days ago

>I understand you to be implying that I shouldn't trust my perception that there's a meaningful difference in how much different models hallucinate.

Nope. Also I'm not GP.

unshavedyak 36 days ago

I am eagerly awaiting being able to run a strong local model. I'd hand Apple $5k right now for a Claude in a box. I know the cost might not be there now, just saying that is around my ideal price point.

$10k might even be worth it - but i'm assuming that the more expensive it is the beefier it is too, which also means more electricity.. and i already run ~6 computers/servers in my house. If a power surge happens i'm going to go live in the woods lol.

atonse 36 days ago

I would do the same but my issue is that the models are changing so fast, so I don't want to be left out of the next model cuz it only runs on an even newer GPU or something like that.

But maybe my limited understanding is thinking of this wrong.

JamesLeonis 36 days ago

I wouldn't worry about hardware.

I've run the latest local models over the last year, including the recent Qwen 3.6 30B A3B, on a 9yo GTX 1080 and 32G RAM I have lying around[0]. If I can do that I don't think hardware will be a problem for you in the near term. The only updates I've needed were to Llama.cpp when a new class of model was released.

[0]: In my case, I want to see how local models perform on limited hardware, sacrificing context size and intelligence compared to SOTA models, so I have to really limit my expectations.

unshavedyak 36 days ago

> I would do the same but my issue is that the models are changing so fast, so I don't want to be left out of the next model cuz it only runs on an even newer GPU or something like that.

I think the same, and it's why i stopped caring about running llama/etc at home last year. That coupled with the models being dumb by comparison to SOTA really make me fine with waiting.

But in a year or two it's going to be difficult to resist at home, assuming the pace of improvement holds.

DANmode 36 days ago

Focus on what’s actually required for your workflows.

Anything beyond that is just hobby, or continued education.

DANmode 36 days ago

You can run 6-12 month old state of the art models for that type of money,

like, yesterday.

unshavedyak 36 days ago

Yea, but i don't consider them good enough. I barely consider SOTA good enough.

I'm hoping that by the time the rugpull happens with SOTA (claude/etc) that at-home will be in the 4.7-5.5 range? We'll see.

DANmode 36 days ago

They were good enough 6-12 months ago.

Maybe your tooling is what’s keeping you from your dream.

templar_snow 36 days ago

Uh... get a UPS?

unshavedyak 36 days ago

I do, though they're not as bullet proof as you'd hope to my understanding. Hell i have one at the house level too - since i have an EV sitting behind that as well.

DANmode 36 days ago

Are you in a region that doesn’t mandate grounded electrical systems?

(UPS is still a great idea for your expensive gear.)

rectang 36 days ago

In my anecdotal experience there is a huge gap between GPT-5-mini which hallucinates relentlessly and Claude Opus or the latest GPTs which are fairly reliable. I'm hoping that gap can be closed with improved approaches for small models and that good reliability is achievable for LLMs without requiring absolutely mammoth computing resources.

For what it's worth, I also used GPT-5.2 (via duck.ai) this year for questions about taxes and it was helpful — which makes sense because there's an abundance of material about taxes out there to be synthesized, so a text predictor trained in that domain should do well.

Barbing 36 days ago

[sci-fi “AGI” scenario] What if those with elite model access philosophize in a way us mere mortals can’t understand, so the elites have to prechew the ideas for us to bring them to our level, and they control the narrative?

In reality now, curious about social implications generally. Does this go beyond problem solving? Maybe the intelligence per token you get via your free library card/membership is insufficient to compete with peers in dating/employment/etc. markets, thus puts you at disadvantage.

unixhero 36 days ago

That isn't really philosophy, but rather doom and gloom theories. Control the narrative on what exactly, how I write a bootstrap script for my servers? Or what type of flower is in this photo. Not everything is politics luckily.

DANmode 36 days ago

Real world AGI scenario:

that’s already how world financial markets and governance work,

and yes, the best of the best models

and $ for tons of compute

will, for now, remain at the top.