| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Karupan 533 days ago
	They are perfectly fine for certain people. I can run Qwen-2.5-coder 14B on my M2 Max MacBook Pro with 32gb at ~16 tok/sec. At least in my circle, people are budget conscious and would prefer using existing devices rather than pay for subscriptions where possible. And we know why they won't ship NVLink anymore on prosumer GPUs: they control almost the entire segment and why give more away for free? Good for the company and investors, bad for us consumers.

1 comments

acchow 533 days ago

> I can run Qwen-2.5-coder 14B on my M2 Max MacBook Pro with 32gb at ~16 tok/sec. At least in my circle, people are budget conscious

Qwen 2.5 32B on openrouter is $0.16/million output tokens. At your 16 tokens per second, 1 million tokens is 17 continuous hours of output.

Openrouter will charge you 16 cents for that.

I think you may want to reevaluate which is the real budget choice here

Edit: elaborating, that extra 16GB ram on the Mac to hold the Qwen model costs $400, or equivalently 1770 days of continuous output. All assuming electricity is free

link

Karupan 533 days ago

It's a no brainer for me cause I already own the MacBook and I don't mind waiting a few extra seconds. Also, I didn't buy the mac for this purpose, it's just my daily device. So yes, I'm sure OpenRouter is cheaper, but I just don't have to think about using it as long as the open models are reasonable good for my use. Of course your needs may be quite different.

link

oarsinsync 533 days ago

> Openrouter will charge you 16 cents for that

And log everything too?

link

moffkalast 533 days ago

It's a great option if you want to leak your entire internal codebase to 3rd parties.

link