Hacker News new | ask | show | jobs
by fredcallagan 54 days ago
Judging by how things are moving ( pricing models, limits, harness patchy updates ), it feels like the real salvation will be a combination of more mature OS models and some open source harness setup like OpenCode or similar. I'm feeling like OS models are nearly there, and with the proper setup and harness might already be there. What are the general thoughts on this ?
2 comments

I only started playing around with local inference a couple weeks ago. Prior to that I was just using Gemini via web since it came with my Workspace subscription, but I did not want to be reliant on the cloud.

Others will have a better idea since they've been messing around with local inference longer than I, but I am quite impressed with the models I have been loading on my laptop with only iGPU. As of this week I no longer feel like I am playing second fiddle with slow inference and small models. Gemma 4 (and maybe Qwen3.5, haven't tried it yet) seem to have changed the game this month!

Even with trying some absolutely shiiiiite models (I only had 16GB unified RAM at the start), I was suitably impressed that I splashed the $300 to double my RAM. I am happy that this one time cost was enough to break through to smarter models and faster inference. No ongoing cloud costs!

It's awesome. Even on a trash computer you can run a small model that works just about as good as anything else for basic questions for free and no privacy issues. It's gotta be the future.
I really think the future are agent harness kits like `itayinbarr/little-coder`. Small, minimal, customizeable pi-coding-agent series of extensions, that has some specialized deterministic logic to "heal" common small LLM errors like getting stuck into thinking loops and syntax errors on tool calling.

This one has "generic healing" for issues present in the current generation of local small LLMs, but if things we see from Frontier LLMs generalize, "optimized healing" for quirks present on your pick of local LLM would be more useful.