GLM-5.2 meets my needs for "thinky" tasks, which for me is code and documentation reviews, technical chats and rubber ducking. (I've tried agentic coding and gone back to writing by hand; besides ethical and skill atrophy concerns, I mostly do hardware design and have not been satisfied with any model's RTL output.) API rates are cheaper than Haiku, with benchmarks around Opus 4.6. I've managed to run GLM-5.2 at home, very slowly, but still neat that this is possible. I personally find it less grating to talk to than Opus.
I use a local Qwen3.6-35B-A3B (@ Q4_K_XL) for my documentation search harness. It works well for its assigned task, which is:
- I dump in a bucket of PDFs and/or source code.
- I ask a question.
- Qwen greps, fuzzy-searches, views rendered PDF pages to check diagrams, possibly gives up and reads everything, and possibly gives up on that too and writes its own scraper with PyMuPDF in a Pyodide sandbox.
- Qwen gives me an answer consisting mostly of citations and links back into the source material.
This approach with local Qwen can extract useful answers from the Armv9-A manual, which at 17k pages is possibly too big for any context window. Qwen has just enough knowledge baked in to know what to search for and understand what it's looking at. A more knowledgeable model would be a waste because even Fable makes shit up, and I want citations, not hallucinations.
DeepSeek v4 Flash gets an honourable mention: somehow all three of fast, capable and cheap. Zero-data-retention providers are available for both GLM-5.2 and DSv4F. I trust OpenRouter ZDR about as much as I trust Anthropic ZDR, since I can audit neither.
Overall I don't miss my Claude subscription, but take what I say with a grain of salt. I was just a Pro subscriber, not a heavy user like some other folks here.
I use a local Qwen3.6-35B-A3B (@ Q4_K_XL) for my documentation search harness. It works well for its assigned task, which is:
- I dump in a bucket of PDFs and/or source code.
- I ask a question.
- Qwen greps, fuzzy-searches, views rendered PDF pages to check diagrams, possibly gives up and reads everything, and possibly gives up on that too and writes its own scraper with PyMuPDF in a Pyodide sandbox.
- Qwen gives me an answer consisting mostly of citations and links back into the source material.
This approach with local Qwen can extract useful answers from the Armv9-A manual, which at 17k pages is possibly too big for any context window. Qwen has just enough knowledge baked in to know what to search for and understand what it's looking at. A more knowledgeable model would be a waste because even Fable makes shit up, and I want citations, not hallucinations.
DeepSeek v4 Flash gets an honourable mention: somehow all three of fast, capable and cheap. Zero-data-retention providers are available for both GLM-5.2 and DSv4F. I trust OpenRouter ZDR about as much as I trust Anthropic ZDR, since I can audit neither.
Overall I don't miss my Claude subscription, but take what I say with a grain of salt. I was just a Pro subscriber, not a heavy user like some other folks here.