Hacker News new | ask | show | jobs
by sjanes 3 days ago
I've kind of given up on the routers for "free" inference, as you would expect, they tend to give you sub-par thinking because they are obviously trying to conserve as much inference as possible.

I've had some success turning my macbook M1 pro into a heating pad with Qwen 3.6 35B A3B MTP. Trying to use Gemini models "locally" resulted in a similar "short shrift" of effort resulting in mistakes and lots of turns. The reports of Fable being relentlessly "proactive" shows you can go the other direction as well, if you have strong enough branding and effective invoicing.

3 comments

> I've kind of given up on the routers for "free" inference, as you would expect, they tend to give you sub-par thinking because they are obviously trying to conserve as much inference as possible.

Xiaomi MiMo ($6/mo: https://platform.xiaomimimo.com/token-plan) & Alibaba Qwen ($50/mo: https://www.alibabacloud.com/en/campaign/ai-scene-coding) have generous limits on fixed subscriptions.

So does Opencode Go ($10/mo: https://opencode.ai/go) for DeepSeek v4 Flash and MiMo 2.5.
That looks pretty nice. How does it compare cost-wise to just using OpenRouter?
The Go plan essentially gives you $50 of inference for $10 per month ($5 for the first month).
$60/mo currently: https://opencode.ai/docs/go/#usage-limits

Their limits are staggered: 5h (max $12), weekly ($30), monthly ($60).

My mistake. You are correct.
> The reports of Fable being relentlessly "proactive"

For the curious: https://news.ycombinator.com/item?id=48498573 - “Claude Fable is relentlessly proactive”.

Tangent: did the MTP help you at all? I’ve tested that model back to back on my M1 Max MBP and the MTP version was actually marginally worse. I wonder if I didn’t use the right settings, although I tried several based on the obvious sources.