Not paying per token? Not sending my code to someone else's servers for inference? That's the stuff of sweet dreams for a stingy, paranoid solopreneur like me.
If I could run a local model comparable to even Sonnet 4.6 without shelling out $50K in hardware, I'd do it in a heartbeat. But all I have is a 32 GB of RAM and an old RTX 4080.
Or am I not up to speed? Are there decent coding models that can run on dev laptops? Not that that's what you were suggesting by recommending a local model, necessarily; just curious.
I am trying to figure this out too... what I am seeing is that the local models like Qwen 3.5 family that fit on hardware like yours handle ambiguity poorly. But are capable of emitting complete apps too.
I do love using local models when I can, but qwen-35B is the best model I can run, and while its an insanely good local model, it does not compare to the big ones.
If I could run a local model comparable to even Sonnet 4.6 without shelling out $50K in hardware, I'd do it in a heartbeat. But all I have is a 32 GB of RAM and an old RTX 4080.
Or am I not up to speed? Are there decent coding models that can run on dev laptops? Not that that's what you were suggesting by recommending a local model, necessarily; just curious.