If you are not being paid by Apple, I feel sorry for you. Cause that means you are so bought into the cult that you are delusional.
the 40-80 tok/sec is only for initial prompt processing, and with the "medium" models, like Qwen3.6:27b. The actual token generation is in the 10 token/second Thats very slow. And your Macbook pro will stop being a LAP-top, because it will get very warm.
Meanwhile, my 2x3090s happily crank out ~100 tok/sec generation. Oh and I can run 100 tok/sec on my phone as well, because I can just access ollama on my home desktop over ssh from termux.
the 40-80 tok/sec is only for initial prompt processing, and with the "medium" models, like Qwen3.6:27b. The actual token generation is in the 10 token/second Thats very slow. And your Macbook pro will stop being a LAP-top, because it will get very warm.
Meanwhile, my 2x3090s happily crank out ~100 tok/sec generation. Oh and I can run 100 tok/sec on my phone as well, because I can just access ollama on my home desktop over ssh from termux.