| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by yogthos 5 days ago

I think it's almost certain that we'll be moving to running local models as a default in a few years. The quality of small models has been improving at an astonishing rate in my opinion. My favorite example is how Qwen3.6-27B that you can run on a laptop outperforms Qwen3.5-397B which was a flagship model requiring a commercial grade server that was released just in February. https://qwen.ai/blog?id=qwen3.6-27b

I fully expect that local models models that are comparable to current frontier models in performance will appear in the near future. Additionally, a lot more can be done with the harness as well, which in my opinion is an under-explored territory right now. For example, ATLAS does some clever tricks in this area https://github.com/itigges22/ATLAS

I started working on my own harness and also notice a significant improvement in model capability with it https://dirge-code.github.io

Apple seems to be one of the few companies to have realized that the future is likely local, and they've been focusing on optimizing hardware for that while everybody else seems to still be stuck in a model as a service paradigm.

1 comments

baron3dl 5 days ago

I think Apple's tech-heavy user base and vertically integrated hardware/network/software mega architecture positioned them perfectly to beat the rest of the market to 1st runner up. The competition knows, they just can't move that fast.

> I started working on my own harness and also notice a significant improvement in model capability with it https://dirge-code.github.io

You should mine your session logs for examples of scenarios that demonstrate this improvement. If you can characterize it in a time series metric, like tokens/feature, as you applied improvements, then you're offering a receipt.

link

yogthos 5 days ago

Yeah, that's a good idea. I haven't really been rigorous with tracking the token usage metrics when I started. I was thinking I could compare solving tasks with opencode too and track metrics for both.

link