Hacker News new | ask | show | jobs
by girvo 17 days ago
You top out at 20 tokens per second on hardware with memory bandwidth this low for any local model actually worth using. Doing the maths, it’s not financially worth it. Only worth it for privacy and control reasons.

I do love my GB10 Asus Spark-like though still!

1 comments

I don’t understand your calculation, can you elaborate? At 25USD/Mtk output, assuming your 20tk/s, I generated/saved (minus power costs) ~15k$ in a year.

Granted, it won’t run 24/7, but over a couple of years, this is definitely cheaper.

This can’t run any models that cost $25/mtok lol. I think the fastest model it’ll reasonably run will be GPT-OSS 120B which costs $.05/mtok.

This is a laptop for CUDA devs and AI larpers.