Hacker News new | ask | show | jobs
by pixel_popping 62 days ago
I disagree, it improved enormously especially at staying consistent for long-tasks, I have a task running for 32 days (400M+ tokens) via Codex and that's only since gpt-5.4
3 comments

Has that task accomplished anything yet?
I think the OP is in for a rude surprise when the task is “finished”.
It will go somewhat like this:

“You're really not going to like it," observed Codex.

"Tell us!"

"All right, said Codex. "The answer to your Great Question..."

"Yes...!"

"Is..." said Codex, and paused.

"Yes...!"

"Is..."

"Yes...!!!...?"

"Forty-two," said Codex, with infinite majesty and calm.

I bet you've asked Codex for that joke :p
Too soon to tell, give it a billion tokens before we make up our minds
Oh boy, you are far from what it requires, we are probably talking 3B+, but note that this is just codex, obviously codex is also doing automatic adversarial with the regular zoo (gemini-3.1-pro-preview, opus-4.6/4.7, gpt-5.3-codex, minimax-2.7, glm-5.1, mimo-2 (now 2.5) and so-on, you get the gist) :)
what is that task doing???
Interesting is that they had the opportunity to explain but decided that hyping it more made more sense. 3 billion tokens!!1!
The correction question is: what isn't that task doing?
Kept the OP employed for a full extra month at their high AI metric firm, hopefully.
Just making Jensen proud is all.
It made Sam richer.
I don't know their margin so I can't really say, but do we have 8 OpenAI accounts, I doubt they are making that much with us seeing that there isn't a single hour where we don't saturate the accounts.
Wtf are you even talking about? Sam has zero stake in OpenAI.
Of course he doesn't.
That’s actually crazy, what kind of task is that? And is that a recurring kind of task like some analysis, or coding related?
Coding (along with docs, tests obviously), rewriting a huge chunk of the KVM hypervisor (in Kernel 7, started in the -rc2) and KSM and other modules, can't say too much about it yet (might do an announcement in coming weeks). The coding is automated but the plan took days of manual arguing (with all models possible) prior (while doing other things during waiting times as I currently manage 70 repos for an upcoming release of our Beta).

I think users really underestimate the capabilities of "AI" when using the right tooling/combinations of models and procedures (and loops), that's talking with 2 decades of dev behind me, genuinely I'm not on phase with people saying it produces slop of any kind, at this stage, it's mostly the fault of the prompter (or the prompter not having enough tokens to do mass adversarial), but clearly, I can genuinely state that the code produced is overall the SAME quality as I would by being extremely meticulous.

I'm like a bot following 30+ threads concurrently, sometimes it's fun, sometimes it feels like playing casino, sometimes it's boring, but this is truly an insane era if you have the funding for it, obviously we stack many MANY accounts in rotation 24/7, equivalent in API cost by myself is about 100K$+ (a month) but we pay only a fraction of that cost thanks to the plans.

PS: I have 8 monitors in front of me to manage all that (portable monitors stacked together).

Please do an update when you're ready, this sounds like madness to me so I'd love to see what the output is. Whatever it is I have to know.
Typical AI psychosis. They might notice it soon or stay in this condition for months.
I don't think you really grasp the direction the world is taking or even really understand AI capabilities when it's put together to reach high automation, you might not agree or embrace it yet, but you will be joining the loop wagon, soon enough.
Yeah right. Sam Altman is as high as you on this drug, but you both are going to wake up soon.
Is it hitting intermediate milestones with solid pre-written and human-reviewed acceptance tests? If not, sounds like a very risky commitment.
Please do a post about this (though I realize that takes time). This sounds amazing. I have always dreamed of doing this too but just don't have the budget.
Specifically, write a post about this and do not have Claude write a post about this.
I have yet to talk to someone who is taking this approach and doesn’t end up with a dumpster fire, but here is to hoping this time is different.

Hope it works and you post about it.

I hope it doesn't work and they don't post about it.
It's just too bad the subsidized costs mean they won't actually feel any real punishment for their failure. Like normally time wasted on its own is enough of a punishment for making a poor decision, but they're not even doing anything themselves here!
I'm also in that boat of not understanding how people fail to get a huge productivity boost from GenAI. And it's not just novices but sometimes seriously accomplished coders. It can't be they're just typing 'Make me an ERP' and then go 'these thing are dumb slop machines' right?
I’m vague on a specific reason for this feeling because there are a few to choose from and no one overpowers the other, but the emotion that comes to mind when I read this is disgust. As a society I feel we will look back on the subsidized opulence of this moment with total and utter contempt.
There's no opulence in spending tokens for entertainment. Vibecoding your own game is the new viral game.
I know exactly the feeling you mean. I get a much stronger feeling of that when I talk with friends who frequently take a plane for a 250 mile trip which has a world-class comfortable high-speed train connection with very frequent trains, each taking less than 3 hours. I'm sure you have friends who would do this in this situation - do you feel the same disgust when you hear them talking about such choices?

I still haven't seen a single person who actually cares about the environment and has willingly made significant sacrifices for it, who clamors about the environmental cost of AI. Every time I see someone do it it's someone who never cared about this before, and still doesn't really. Who buys plenty of new clothes and furniture, loves a good burger, has the latest iPhone, flies 4 times per year.

Maybe you're the unicorn in which case fair enough, you've earned the right to feel disgusted.

Or nostalgia for simpler times
That as well. But everyone reading GP’s posts knows in their bones that it’s unsustainable. It’s economically unsustainable and environmentally unsustainable, and in that context it strikes me as pure hoarding behaviour. Taking as much as they can for themselves before the house of cards crashes down.

I have no sympathy for OpenAI or Anthropic as corporations, but if these are the new tools of the trade, then platform abuse like GP is bragging about serves only to destroy the livelihoods of the rest of us who are content to use our fair share.

There’s no such thing as a free lunch, and the bill always comes at the end.

I mostly hate it because the token crunch is now coming for us regular users because of people like this. A few people always ruin it for the rest of us.
> (might do an announcement in coming weeks).

Don't be surprised if/when people ignore your AI slop

...what? what kind of a task are you running?