Hacker News new | ask | show | jobs
by bigiain 226 days ago
> In 2022 the best available models was GPT-3 text-davinci-003 at $60/million input tokens.

>GPT-5 today is $1.25/million input tokens - 48x cheaper for a massively more capable model.

Yes - but.

GPT-5 and all the other modern "reasoning models" and tools burn through way more tokens to answer the same prompts.

As you said:

> We're beginning to find more expensive ways to use the models though. Coding Agents like Claude Code and Codex CLI can churn through tokens.

Right now, it feels that "frontier models" costs to use are staying the same as they've been for the entire ~5 year history of the current LLM/AI industry. But older models these days are comparably effectively free.

I'm wondering when/if there'll be a asymptotic flattening, where new frontier models are insignificantly better that older ones, and running some model off Huggingface on a reasonably specced up Mac Mini or gaming PC will provide AI coding assistance at basically electricity and hardware depreciation prices?

1 comments

That really is the most interesting question for me: when will it be possible to run a model that is good enough to drive Claude Code or Codex CLI on consumer hardware?

gpt-oss-120b fits on a $4000 NVIDIA Spark and can be used by Codex - it's OK but still nowhere near the bigger ones: https://til.simonwillison.net/llms/codex-spark-gpt-oss

But... MiniMax M2 benchmarks close to Sonnet 4 and is 230B - too big for one Spark but can run on a $10,000 Mac Studio.

And Kimi K2 runs on two Mac Studios ($20,000).

So we are getting closer.

Also, at some point the Blackwell-generation DGX Station is supposed to ship with 768 GB of unified memory. It will presumably come with a high five-figure price tag, and it should be able to run most open-source models with little need to trade off quality for speed.

Trouble is, there's not even much hype surrounding the launch yet, much less shipping hardware. Which seems kind of ominous.