Hacker News new | ask | show | jobs
by dust42 68 days ago
I am using it with pi agent and I have stopped renting tokens. Much better for me than Claude Code, on M1 Max 64GB. This model with oMLX is at 16k context PP 919.9 tok/s and TG 54.7 tok/s. You have to manage the context but the better you manage context the more focused the output is. I use it without thinking.