I use opencode with local Qwen 3.6 or GLM 4.7 models, running in llama.cpp, and am very happy with it for a lot of coding and code analysis work. Infinite tokens!
I have a Macbook Pro with M1 Max processor. I build llama.cpp from main every week or so, which I'm pretty sure includes Apple's 'metal' acceleration. That's it tho.