| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by simonw 9 hours ago

I think gemma-4-26b-a4b and Qwen3.6-35B-A3B show that there's something very interesting about a local model that does mixture-of-experts (which helps a lot with performance) and has in the order of 30 billion parameters.

These models are very capable, and use around 20-30GB of RAM while they are running.

Provided you have 64GB of RAM that leaves space for running other applications at the same time.

1 comments

chrisweekly 9 hours ago

Obtaining that 64GB RAM is a meaningful obstacle for many.

link

simonw 9 hours ago

I'm still amazed that you can run LLMs of this quality on a machine that costs less than $3,000.

I used to assume that anything GPT-4 equivalent or higher would need $30,000+ of server-class hardware.

That said... gemma-4-12b-qat is 7.15GB on disk so should run reasonably well in 16GB, that takes it down to MacBook Air territory https://lmstudio.ai/models/google/gemma-4-12b-qat

link

frollogaston 6 hours ago

Not just RAM, VRAM, right? Though they're one and the same on the Mac.

link