| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cauenapier 121 days ago

What’s wild about Step-3.5-Flash isn’t just the quality, it’s how close it is to fitting on personal hardware.

The int4 weights are ~110GB. That sounds insane, but 128GB unified memory machines already exist, and people are running it today. A few years ago, a 200B-class model was pure datacenter territory. Now it’s “expensive laptop / workstation” territory. That’s a huge shift.

The interesting part isn’t that this model is big. It’s that hardware curves and model efficiency are finally intersecting. Sparse MoE + quantization means frontier-ish reasoning is no longer locked to hyperscalers. We’re basically one consumer hardware generation away from this class of model being normal for power users.

1 comments

mh3467 121 days ago

Indeed. The direction is promising - the democratization of frontier intelligence. Your personal assistant (this and that Claw) isn't powered by commercial models via API but rather a model smart enough and small enough hosted on your own device.

link