| That's almost exactly my setup and I'm very happy with its performance. I noticed recently that I started to prefer my local Qwen3.6 35B A3B and pi agent over Claude Code. Both fail at different tasks, and Qwen more so than Claude. But the way Qwen fails is much more straightforward. In writing tasks Qwens hallucinations and bullshitting are much easier to spot because it doesn't have the sleek vocabulary and wordsmithing skills to disguise its ignorance. In coding tasks that Qwen can't solve it often just goes into a tool calling doom loop that the pi harness can catch, whereas Claude attempts ever more convoluted and creative things just making more and more mess that takes forever to clean up. I think part of the story is that the tasks for which I use AI are fairly simple and maybe don't need a frontier model. But I wonder if "proper" developers had similar experience? |
The moment I'm trying something open-ended or ambitious, Claude/ChatGPT clearly take you to the goal quicker.
For things, where there's a way to build a knowledgebase though, the local llm definitely can be a true contender. Plus, having a big context and no worries about filling it over and over - you can get quite far.
I'm writing this, literally in between cooking a pasta, that the local llm ordered products for me online. I've built a grocery shopping skill, so that it roughly knows what I have in fridge (losely), my last 10 representative orders (general preferences plus rich info about shops and skus around me) and actual real-time in stock info. The last part has been my personal pet peeve for every product that promised cooking ingredient delivery (that is not packaged specifically for that).
This is what has been promised to us by every big tech company with an agent, and now a local llms actually solved that for me fully.