Hacker News new | ask | show | jobs
by NitpickLawyer 54 days ago
> a lot of the hype around running these models locally is bullshit. Sure, you can make it do something but certainly nothing useful or substantial.

There is certainly a lot of hype around local models. Some of it is overhype, some of it is just "people finding out" and discovering what cool stuff you can do. I suspect the post is a reply to the other one a few days ago where someone from hf posted a pic with them in the plane, using a local model, and saying it's really really close to opus. That was BS.

That being said, I've been working with local LMs since before chatgpt launched. The progress we've made from the likes of gpt-j (6B) and gpt-neoX (22B) (some of the first models you could run on regular consumer hardware) is absolutely amazing. It has gone way above my expectations. We're past "we have chatgpt at home" (as it was when launched), and now it is actually usable in a lot of tasks. Nowhere near SotA, but "good enough".

I will push back a bit on the "substantial" part, and I will push a lot on "nothing useful". You can, absolutely get useful stuff out of these models. Not in a claude-code leave it to cook for 6 hours and get a working product, but with a bit of hand holding and scope reduction you can get useful stuff. When devstral came out (24B) I ran it for about a week as a "daily driver" just to see where it's at. It was ok-ish. Lots of hand holding, figured out I can't use it for planning much (looked fine at a glance, but either didn't make sense, or used outdated stuff). But with a better plan, it could handle implementation fine. I coded 2 small services that have been running in prod for ~6mo without any issues. That is useful, imo. And the current models are waaay better than devstral1.

As to substantial, eh... Your substantial can be someone else's taj mahal, and their substantial could be your toy project. It all depends. I draw the line at useful. If you can string together a couple of useful tasks, it starts to become substantial.

1 comments

Can you share more on how your setup has changed over time for running these? Do you prompt them for code samples like some people do with ChatGPT or did you integrate them into your IDE or some kind of custom harness?
Sure. I usually work with devcontainers from vscode. They provide great integration ootb (port forwarding & stuff) and are ok for containing the agents for most cases. If you want to work on docker projects I also tried vagrant for vm with docker inside, and you instantiate the agent from vagrant.

For local models I used mainly cline and then roo code extensions. Roo was a bit better because it offered more customisation (prompts, tool choice, etc). I found that local models need shorter prompts and less tools to be effective. Unfortunately roo seems to be discontinued, no idea what I'll use after it stops working. Cline works fine for most of the cases ootb, especially if you run inference on a platform that supports good kv caching - I use vLLM.

For subscriptions I use their own harness, as you get the best bang for the buck. For 3rd party subscriptions that don't have their own harness I use opencode (I got a very cheap sub for GLM that I use for exploration and oss projects).