|
|
|
|
|
by deanc
56 days ago
|
|
This has been exactly my experience too. I've tried multiple harnesses (pi, claude code, codex) with multiple variants of qwen3.6 and gemma4 driven by both o mlx and ollama - and every single time I try to do anything meaningful I end up in a loop. On a 64GB Macbook Pro M3 Max. I really don't know what the hell people are doing locally, and suspect a lot of the hype around running these models locally is bullshit. Sure, you can make it do something but certainly nothing useful or substantial. |
|
I ran 8 tests on a variety of open-weights models, and opus 4.7 (1mil ctx version) and the little dense model was right behind it: https://github.com/sleepyeldrazi/llm_programming_tests/tree/... Of note is that opus was the only model to push back against the spec on the hardest challenge, saying 'thats not possible', when there are links in the spec to examples of it being done.
There may be problems with the mlx versions, as i haven't had any looping in all the testing i've done, which is all my agentic and coding work the last couple of days (since it dropped). I have had tool_call misses 4 or 5 times so far, which isn't ideal but no looping. First I used it in pi-mono and later when i realized it's a serious model switched to opencode.
My setup is llama.cpp running on a 3090 in WSL, unsloth IQ4_NL with those flags: --ctx-size 128000 \ --jinja \ --temp 0.6 \ --top-p 0.95 \ --top-k 20 \ --min-p 0.0 \ --repeat-penalty 1.0 \ --presence-penalty 0.0 \ --threads 12 \ --gpu-layers 99 \ --no-warmup \ --no-mmap \ -fa on