Hacker News new | ask | show | jobs
by Jeremy1026 86 days ago
I went from an M1 16GB to M5 Pro 48GB. I'm running Qwen 3.5 with it locally. I've been sending it and Opus 4.6 the same prompts in identical copies of codebases, using Claude Code for both (using ollama to launch with Qwen). It is about 4x slower than sending the request to Opus. The results are not nearly as good either.

One task that I sent to both was to make a website to search transcription files generated from video files that were also provided. I wanted to have the transcriptions display and be clickable. When clicked have the video skip to that point in play. The Opus website looked nice, and worked well. Qwen couldn't get the videos to play.

Now, for day-to-day tasks, the M1 wasn't a slouch, but the M5 Pro is still a big step forward in terms of performance.

2 comments

That's helpful insight. My prediction is that as it keeps getting more expensive for the big players to run these models, we will start to see some kind of hybrid workload where they offload some of the work to your computer for smaller agents while keeping the orchestration and planning running in the data centers.

So I think the investment in the extra hardware is worth it, even if you don't currently plan on running LLMs locally.

I mean I get you but, I also know that there are better ways to operationalize local AI. Your POV still remains as super helpful context. I feel like a lot of local-vs-cloud discussion stops at “slower and worse,” but the useful part is understanding where it broke down like model quality, tool use, runtime setup (and not stop at the task performance between the two in it self).