| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by julianlam 54 days ago

I only started playing around with local inference a couple weeks ago. Prior to that I was just using Gemini via web since it came with my Workspace subscription, but I did not want to be reliant on the cloud.

Others will have a better idea since they've been messing around with local inference longer than I, but I am quite impressed with the models I have been loading on my laptop with only iGPU. As of this week I no longer feel like I am playing second fiddle with slow inference and small models. Gemma 4 (and maybe Qwen3.5, haven't tried it yet) seem to have changed the game this month!

Even with trying some absolutely shiiiiite models (I only had 16GB unified RAM at the start), I was suitably impressed that I splashed the $300 to double my RAM. I am happy that this one time cost was enough to break through to smarter models and faster inference. No ongoing cloud costs!

1 comments

2ndorderthought 54 days ago

It's awesome. Even on a trash computer you can run a small model that works just about as good as anything else for basic questions for free and no privacy issues. It's gotta be the future.

link