| I have come at this at a slightly different angle. I am a fully-burned-out freelancer (in the last couple of years so severely and totally that I thought I had early onset dementia, and I am still not sure I don't). I don't really have an off-ramp to anything else yet, but the sea-change in the industry has been contributing to my feeling that I should knock it on the head. I must get past broad understanding of AI to deep understanding, but I have to find a way to do this which sits well with freelancer ethics (sustainability, stability, control of destiny). So I decided I would start out with that operating principle that ultimately this stuff is just going to be local: models will eventually hit some level of practicality for most tasks and technological progress guarantees that they will eventually run on desktops. I decided to learn how to run models locally properly, see how far I get with opencode (and Pi and Zed experiments), and grow outwards from there to metered models (opencode go, openrouter etc.) Knowledge first; what can I do that meaningfully changes my outcomes and confidence with no cost and no exposure to sudden change? I have a secondhand M1 Max (excellent GPU bandwidth), and I am really shocked to find that arguably that level of practicality is already here. Qwen 3.6 35B can really do a lot. And — not sure if you have tested it — but in some ways I think the Gemma 4 26B is better. Particularly for more commonplace dev tech — it is very knowledgeable about the sort of low-end web dev stack that is most common (Wordpress, PHP, MySQL). I have been getting 75 tokens/sec with (GGUF) Gemma-4 26B QAT and MTP. (Can't get anywhere close with MLX, for some reason.) A similar sort of speed with an MLX Qwen 3.6 35B. I have a sneaking suspicion that maybe llama.cpp is now faster than MLX on this older kit so I might try seeing what llama.cpp can do there, too. Not blazing fast, but fast enough that there are plenty of experiments and small jobs I can do before I even get to using Big Pickle! |