Hacker News new | ask | show | jobs
by Tiberium 32 days ago
This is quite a misleading title because this is the raw API cost, but he (obviously) has unlimited usage as an OpenAI employee. Moreover, if you use e.g. the $200 Codex sub, you get about ~$5k-$6k monthly API usage if you spend every week of your usage, if not more, which shows that the raw API cost is not how much it (likely) costs to OpenAI, unless they're subsidizing all this.

He did clarify that it was with fast mode. Without fast mode it'd "only" be $300k in raw API cost, or ~60 $200 Codex subscriptions.

7 comments

Hey guys, I’m super good at using tokens.

Business: Amazing, that’s great what did you do?

I ran 50 instances and had them all fix the same bugs at the same time and then analyzed the results of all 50 runs to have AI score each of the attempts, then sort them, then compare them to each other in a round robin tournament style double elimination to ensure I got the best result. Then I had AI convert this into a skill, and then ran all 50 attempts again and repeated the process to ensure that I had the absolute best result. It was amazing and I used 1.3 billion tokens!

Business: That is amazing! What did you fix?

A spelling mistake on the About page.

This is the best use of tokens I've ever read. I'm building this skill as we speak to use on our Enterprise Claude account.

Wish me luck on a raise!

Claude, somewhere in this codebase I've mispelled a common word; the word is also a homophone and further, is easily confused with another word that has three r's; please start up a subagent for each file and count the r's and verify how many r's there are's; if there's three, then make sure to review potential homophones and check that I've spelt the correct worrrd incorrectly correct.
Peter Steinberger is of course currently employed by OpenAI and it probably benefits them for him to find ways for their customers to do that.
Sounds like a brilliant promo project!
How is it misleading if this would be the consumer's cost?

Eventually Codex's subscription subsidization will diminish to near-zero, like the rest of the providers.

It's extremely important that people understand how expensive these models currently are. Even $300k in raw API costs is alarming for the output.

> How is it misleading if this would be the consumer's cost?

Because it does not say “equivalent of”, it literally says he spent money that he did not spend

This. If I go up to my boss and say “I spent $10000 but it only cost us $1000” then I spent $1000.
Depends on elasticity, if you could have easily sold that $1000 worth of product and made $10k, then you spent $10k.
No. Then you missed the opportunity to make 9k.
If anything it cost more than the title, because customer costs are wildly subsidized.

So yeah its misleading but in the other direction.

Inference is highly marked up. Total costs including training may be subsidized (,in a sense since the AI companies are widely reported to not break even as yet)
We know how expensive the (Chinese) models are to run, because there are a hundred inference providers selling them cheaply and competitively.

The money going to the American model companies is not going to their hosting costs.

Peter shows the near-term future. Raw API consumer price cost is arbitrary. (The frontier labs can put a 100x markup to cover other operational expenses.) The true cost of inference with same-capability models keeps dropping at dizzying rates, especially at the data-center batch size. (Due to both NVidia hardware and algorithmic changes.) So the developments that Peter can achieve today with internal support from OpenAI will be doable by anyone in a few years without breaking the bank.
But.... why? Like I read his thing on how he spends the tokens [0] and it sounds like satire.

He has agents write shitty code for features other agents think other people want, then has it reviewed by other agents in hopes of catching bugs that the first agent put there, then has some more agents try to find security bugs in the now double-agented code to make it triple-agented and at the end of the day, he spent a shitton of tokens, probably emitted enough carbon to heat our planet by another degree, and has a feature nobody really asked for that might or might not work.

He then has the sense of humor to call this grotesque process "incredibly lean".

What's the point in all of this? What problems is this solving? Who's benefiting?

[0] https://xcancel.com/steipete/status/2055405041843052792

I don’t use openclaw myself anymore, but this agonizing is thin and unbearable. He did a thing. People use the thing. He got paid for the thing. He iterates the thing. What’s hard to understand about this?

The morality issues about consumption climate impacts are not his alone, and are not unique by itself to his endeavor. Every company with an enterprise LLM agreement has a share, for instance.

> I don’t use openclaw myself anymore

Firstly, who TF would use that crap in the first place at all? Yeah, he did some crap he got paid for. So did the people who created the addictive algorithms for social and media or creators of the brainrot videos that infest kids' minds. Should we applaud them too?

You can hate it, but pretending it has no value isn’t a meaningful counter, esp given its user base. Gary Tan built GBrain on it. Poor logical fallacy-ing on your part.
>He then has the sense of humor to call this grotesque process "incredibly lean".

> What's the point in all of this? What problems is this solving? Who's benefiting?

The economy doesn't work like how you think it does. Its not central planning. All the usages aren't detailed in a specification, submitted for approval to 100 agencies and then allowed to be used.

It shows lack of intellectual curiosity to not engage deeply with obviously profound technology and what the implications are. I find this exercise helpful.

Peter is predicting how LLMs will be used in the future when the prices go down. And they will definitely go down. I think his predictions are correct and we will definitely have something similar to OpenClaw.

> The economy doesn't work like how you think it does. Its not central planning.

I'm aware. That is in fact my central critique. The way it works is incredibly wasteful of our limited resources, as illustrated by this guy burning through fuel during a time of crisis for no perceptible gain.

> It shows lack of intellectual curiosity to not engage deeply with obviously profound technology and what the implications are.

The "obviously profound" is an assertion without proof.

The rest I agree with, we should engage with the implications of burning through energy to build features that bots think humans want, but nobody actually asked for, all while climate scientists are telling us we're heading for the apocalypse. It is intellectually incurious to just ignore the questions of why and at what cost, maybe even dangerously so.

> The way it works is incredibly wasteful of our limited resources

You should try playing the game “workers and resources”; it’s a simcity like game, but based in the Soviet system of central planning, not capitalism. It will make you loathe the inefficiencies in central planning.

> what the implications are

like one bot finding similar issues and PRs, the another bot closing issues for "lack of activity", meanwhile people are reacting and pleading to speak to a real human?

Congrats builders of the future, you've turned software development into automated voice systems.

Mario Zechner wrote the main part of this IP laundering application.

I didn't know that studying photocopiers is suddenly linked to "intellectual curiosity". Being a photocopier maintenance guy was always considered boring.

What you put on top of the machine was intellectually interesting.

But this is okay?

“He has /people/ write shitty code for features other /people/ think other people want, then has it reviewed by other /people/ in hopes of catching bugs that the first /people/ put there, then has some more /people/ try to find security bugs in the now /double-peopled/ code to make it /triple-peopled/ and at the end of the day, he spent a shitton of /money, the people/ probably emitted enough carbon to heat our planet by another degree, and has a feature nobody really asked for that might or might not work.”

Honestly sounds like a normal tech company to me. Just with much dumber “people” who are getting exponentially smarter, eventually never die, eventually never forget.

You have to skate to where the puck is going, not where it is.

> Just with much dumber “people” who are getting exponentially smarter

They haven't gotten any smarter yet, let alone exponentially smarter. They are still the same dumb parrots that they were in the beginning.

With the rate our planet is heating up, there may be nobody left to skate after the puck.
Do not equate people with bullshit LLMs, please.
Peter shows shit. What did Peter meaningfully achieve? What additional revenue is he creating? ah yes - shit and more shit on all accounts as it seems.
>OpenClaw hit 346K GitHub stars in under five months. 38 million monthly visitors, 3.2 million active users, 44,000+ ClawHub skills, 500K+ running instances, and 180 startups generating $320K+/month. OpenAI acquired the project in February. (https://openclawvps.io/blog/openclaw-statistics)
Let me state it again in plain language: How much revenue did the project create and what economic or societal value in general does it create? Gamification bullshit "achievements", like StackOverflow badges and GitHub stars ARE NOT VALUE.
the grift economy requires hype men. Keep up bro.
Even at unlimited budget, there is a crossover where outsourcing thinking to the machine costs more than the machine.

What I mean by this:

1. Intern, analyst, junior, or offshore level coding is cheaper when done by the machine.

// Side note: There is good reason the industry invests in suboptimal output from this set which moves to the "cost" column when using an LLM, but nobody's accounting for that.

2. For the interns, analysts, junior, or offshoring to do the right thing costs a multiple of the coding effort: the PdM/PjM stuff of course, but also the Stakeholder, Product Owner, Architect, Principal Engineer, QA, and SRE stuff.

3. If you are not a principal or staff engineer level engineer, you are likely unqualified to catch and fix the errors LLMs make across engineering, much less these other PDLC (product development lifecycle, which includes SDLC and SRE) loop.

4. For LLM output to be useful, your 'harness' has to incorporate all of that as well, which because it's so much harder than transliterating spec-to-code, balloons tokens exponentially.

5. Today it is faster, more efficient, and costs less, to work with LLMs "XP" (eXtreme Programming) style, pairing with the LLM actively co-creating and co-reviewing, steering for more effective turns.

So, your options are:

- ship garbage while costing less than a median first world SWE

- pair with the LLM actively for the benefits of XP

- add enough harness and steering the LLM costs more than SWEs, and still needs a human loop “move fast and break things to find out what's broken” style

I would expect that within a couple years, these other disciplines can be baked in enough the machine costs less for everything but surprises.

> I would expect that within a couple years, these other disciplines can be baked in enough the machine costs less for everything but surprises.

They already are. I’m successfully using frameworks like bmad to deliver complex apps at that level. My job is to manager the see, as, ux, sre processes and catch errors.

I spend more time refinding prd , epics and stories than I do elbows deep in code.

If I don’t like the output of a story I nuke it change the story and have the flanker try again. I’m using the open source glm, kimi, deepseek models. I expect the full pipeline to be good enough by the end of the year.

> I spend more time refinding prd , epics and stories than I do elbows deep in code.

And do you enjoy this more than writing code? I used to look forward to writing code, solving these little optimization puzzles, learning, and staying sharp. Working with agents is dreadful in comparison. They lie, rarely learn, and I feel like a proctor.

Sure, you sometimes get to see something amazing, but usually I am just very annoyed by their performance and ever-changing but never-ending billing issues. First, with Claude Code, now with Codex, which was fine for a minute, but now I am out of tokens for the majority of time. (I don't have the income for those Pro INTx plans.)

I don't enjoy writing authN code, or frontend code, or all the myriad bits of glue converting between one thing and another.

Now, I'm master of about a thousand lines of pricing code plus documentation and research which actually matters. The AI can handle the rest as a very skilled junior with a TBI.

You absolutely are a proctor, or senior manager. The AI is the smartest most well read junior you will ever meet, but don't go out of its happy path.

As you go out of the commonly read happy path for CRUD apps, you'll have to get more and more involved. I wouldn't write a new kernel design with AI right now, I might write a Linux kernel driver with it though.

I think its less misleading this way because every other reader would have to pay $1.3M to emulate his workflow for a similar size project. His discounted internal costs are relevent only to openai.
I did mention that you could use ~60 $200 Codex accounts to emulate his workflow without /fast, or 2.5x that if you used /fast. Not $1.3M
"The tokens are how anthropic makes profit" vs "It's not actually worth that amount of usage"

ya'll cant have it both ways; either it's really worth the cost or it's a bunch of token burn with no smoke.

> unless they're subsidizing all this

They literally are. (If by "all this" you mean the subscription future bait-and-switch plans.)

But even going with the $5k - $6k monthly usage on a $200 codex subscription even going over their limits is also unrealistic in the long term and that is just ONE person.

Lets say I was at the casino and was spending a lot on casino chips but I also happen to work at the casino. I'm not really losing money whether if I win / lose since I'm using the houses money and there's little risk involved on every dice roll or press of the button. The risk is far higher if I don't have that level of access and continue to spend the same amount of money on lots of tokens (or casino chips, spins or button presses.)

The same is true here with these agents. Some companies will realize that they can no longer afford to spend millions a month on tokens or even startups spending $5k - $6k per person per month on tokens.

I can only see local efficient models making sense on recovering from this unnecessary spending or even light gambling on tokens.