|
|
|
|
|
by dabockster
317 days ago
|
|
> We believe that all agents will long more like this in the future - long running, asynchronous, more autonomous. Specifically, we think that they will: > Run asynchronously in the cloud > cloud Reality check: https://huggingface.co/Menlo/Jan-nano-128k-gguf That model will run, with decent conversation quality, at roughly the same memory footprint as a few Chrome tabs. It's only a matter of time until we get coding models that can do that, and then only a further matter of time until we see agentic capabilities at that memory footprint. I mean, I can already get agentic coding with one of the new Qwen3 models - super slowly, but it works in the first place. And the quality matches or even beats some of the cloud models and vibe coding apps. And that model is just one example. Researchers all over the world are making new models almost daily that can run on an off-the-shelf gaming computer. If you have a modern Nvidia graphics card, you can run AI on your own computer totally offline. That's the reality. |
|
Of course, the line will always be pushed back as frontier models incrementally improve, but the quality is night and day between these open models consumers can feasibly run versus even the cheaper frontier models.
That said, I too have no interest in this if local models aren't supported and hope that's down the pipeline just so I can try tinkering with it. Though it looks like it utilizes multiple models for various tasks (planner, programmer, reviewer, router, and summarizer) so that only adds to the difficulty of the VRAM bottleneck if you'd like to load different models per task. So I think it makes sense for them to focus on just Claude for now to prove the concept.
edit: I personally use Qwen3 Coder 30B 4bit for both autocomplete and talking to an agent, and switch to a frontier model for the agent when Qwen3 starts running in circles.