I think its worth noting that if you are paying for electricity Local LLM is NOT free. In most cases you will find that Haiku is cheaper, faster, and better than anything that will run on your local machine.
This 35B-A3B model is 4-5x cheaper than Haiku though, suggesting it would still be cheaper to outsource inference to the cloud vs running locally in your example
Running at a full load of 1000W for every second of the year, for a model that produces 100 tps at 16 cents per kWh, is $1200 USD.
The same amount of tokens would cost at least $3,150 USD on current Claude Haiku 3.5 pricing.