Hacker News new | ask | show | jobs
by fooker 41 days ago
This might not pan out to be the glorious victory of human craft as you’re imagining it to be.

Here’s a slightly different future - these AI rescue consultants are bots too, just trained for this purpose.

Plausible?

I have already experienced claude 4.7 handle pretty complex refactors without issues. Scale and correctness aren’t even 1% of the issue it was last year. You just have to get the high level design right, or explicitly ask it critique your design before building it.

4 comments

> You just have to get the high level design right, or explicitly ask it critique your design before building it.

Do you think people are not giving their agents specs and asking for input?

A thing I've noticed is that everyone thinks they prompt better than the next guy.
This. I have this buddy, who is not an idiot by stretch of the imagination and more adventurous than me in some ways ( I don't really run agents on my machine ), but when I was looking at his prompts, I sometimes question how he gets anything done at all. It is vague and angry demands.
Not sure about the angry part, but vague sometimes works really good. The important part is to have enough good context pushed into the context window beforehand (codebase explorations, docs, etc). Then a vague prompt of the general direction gives the autocomplete more “freedom” to figure out the “best” approach given the context.

Doesn’t work well ofc in a one shot situation with no context.

Yeah, sorry, I myself was being vague, because I don't want to give any identifying info even by mistake. You are right; generalizations here are not as useful. I was talking about something the lines of 'can you make it better', but without llm having the context to understand what better could potentially mean. For brainstorming sessions, I love to start broad. Admittedly, I have limited experience with agents ( though current project intends to bridge that gap ) so it is possible I am missing something ( plus, to your point, I don't know his full setup ).
Very often, no.
The ones who end up with messes, no
Maybe the professional devs, but not the vibecoders
One AI can't vibe code out of the mess, so you'd make another AI trained on getting out of vibe coded messes?

That's serious levels of circular thinking right there.

This is literally how training humans have worked for thousands of years.

We train humans to do things untrained humans can not do.

Humans, unlike LLMs, are capable of reasoning and thinking. Thus humans, unlike LLMs, can actually be taught and improve.
No it's not, don't be facetious.

That's not at all how AI training works.

And the bots training the bots are just bots that were trained to train bots?
Nothing that sexy, just thirty odd years of software engineering data from humans.

Commits, design reviews, whitepapers, code reviews, test suites. And pretty concerning : chat logs and even keystrokes from employees nowadays.

The way we train specialized bots now is incredibly inefficient, that part is rapidly improving.

I think that will happen. I think several things can be true at the same time:

- AI Hype

- AI Psychosis

- AI keeps getting better and better until it can work around big AI slop code bases

With GPT 5.4 or 5.5 I did not notice degradation in performance when it was working on a large 5k line file containing a WebView, JS scripts, as well as native UI.

I instructed it to split it up anyway, yet I wonder how often the concerns around the mess are imaginative rather than practical.

> AI keeps getting better and better until it can work around big AI slop code bases

The belief in this is a form of AI psychosis, I think.

Maybe in the future but certainly no evidence of this anytime soon

> Maybe in the future but certainly no evidence of this anytime soon

Here's some anecdotal evidence from me - I cleaned up multiple GPT 4.x era vibecoded projects recently with the latest claude model and integrated one of those into a fairly large open source codebase.

This is something AI completely failed at last year.

Maybe you should try something like this or listen to success stories before claiming 'certainly no evidence' in future?

There are untold billions of dollars to be had if you can make this future come to pass. You don't need AGI to make it happen either. You just need to keep making the context windows bigger and keep coming up with updated training data. It's not the outcome I want, but it really does feel within reach. The only limiting factor is going to be token count and cost to process/generate those tokens. But if you don't particularly care about quality, costs are going to have to go up by several orders of magnitude before you start to regret firing your software engineers.

I don't know what happens in a decade when there are no junior engineers, skilled senior engineers are becoming rare, and the only data left the train LLMs on is 200th-generation slop. But AI slop being qualitatively slop is not enough of a obstacle to prevent that future from coming to pass. And billions of dollars will be "saved" along the way.

Companies are already putting billions out there just to secure and produce training data. And that's the isseue; spending X billions to make X-Y) billions isn't a profit, it's a gamble hoping Y becomes negative (or at least close to zero with a commodity that is profitable) . Real profits have not been made directly from the work on AI as of now. It's made from marketing a narrative of AI working.

That's what makes this whole house of cards dangerous. The prescription to psychosis is profitable. Aka, selling a grift.

I have personally had success telling Claude that some AI-written system is too complicated and ask it to rewrite it in a more logical way. This sometimes results in thousands of lines of code being deleted. I give an instruction like that if I see certain red flags, eg:

1) same business logic implemented in two different places, with extra code to sync between them

2) fixing apparently simple bugs results in lots of new code being written

It’s a sign I need to at least temporarily dedicate more effort to overseeing work in that area.

I somewhat agree with the AI psychosis framing of the OP. It takes some taste and discipline to avoid letting things dissolve into complete slop.

No evidence? Chatgpt came out 3 years ago. You basically just need to stick a ruler up on a curve
I'm no expert, but the skeptic's opinion I've heard would be to ask:

What evidence is there that we're not at or close to a plateau of what LLMs are capable of? How do you know the growth rate from 2023 to present will continue into 2029? eg. Is it more training data? More GPUs? What if we're kind of reaching the limits of those things already?

Ultimately, you are describing a fundamental problem with induction -- Hume's problem of induction to be specific. How can we know that anything that has been shown empirically in the past will continue to be true - we can't. Best to investigate mechanistically:

I don't see why we would assume that we are at a plateau for RL. In many other settings, Go for instance, RL continues to scale until you reach compute limits. Some things are more easily RL'd than others, but ultimately this largely unlocks data. We are not yet compute/energy/physical world constrained. I think you would start observing clear changes in the world around you before that becomes a true bottleneck. Regardless, currently the vast majority of compute is used for inference not training so the compute overhang is large.

Assuming that we plateau at {insert current moment} seems wishful and I've already had this conversation any number of times on this exact forum at every level of capability [3.5, 4, o1, o3, 4.6/5.5, mythos] from Nov 2022 onwards.

I think we're close to the plateau of what LLMs can do, but they will keep improving. IMHO the results are already showing diminishing returns.

The (leading) LLMs work by consensus, like Wikipedia, Openstreetmap, web search engine or opensource movement.

What I mean is if I ask LLM "create a linked list", its understanding (of what I want) is already close to the expected ideal. Just like Wikipedia article on linked list, for example.

But the LLMs will continue to improve in breath and depth of understanding the world, although technically (what they CAN do) they probably already peaked. Similarly, OSS movement technically peaked in the 90s with the creation of compiler, operating system and a database; doesn't mean that new opensource isn't being created.

There is so much money at stake, and so much money pouring into AI development, that I think we are going to continue to see gains for a while. People keep coming up with new agent harness techniques like chain of thought, tool calling, and memories. And then the big LLM companies figure out how to actually train their models to optimize the use of those techniques. To claim that we are reaching the top of the plateau is to claim that we are out of effective ideas for improvement. I think that's a ridiculous claim, the technology is too new. And because of the strong incentives to keep making these things better, it's pretty much a given that people will continue to explore ideas until we really are out of effective ideas. I don't think anyone apart from professional AI researchers have any idea where this is all going to settle.
Since we're not experts, we treat it as a black box. What are the results? Is the quality of the results improving? Is the improvement accelerating or decelerating?

And the answer appears to be that the improvement is accelerating. So how could it be stopping?

https://metr.org/time-horizons/

I don’t think improvement is accelerating. We went from “computers can’t do these things at all” to “now they can” in a few years with the discovery of transformers, and now we get “it can do the same things, except incrementally better, at a drastically higher cost” every few months.

I don’t think that the current AI paradigm has infinite headroom for improvement, similar to how every other AI approach before it eventually hit a limit.

I'm more curious about how much more capability they can get before the economy collapses.
It's amusing to me that:

* A belief that AI will keep getting better, presented without evidence, does not yield a lot of skepticism around these parts.

* Your comment saying it is wrong to believe AI will keep getting better, also presented without evidence, is downvoted.

AI is currently, actively getting better. That might stop in a year, and you can argue whether it’s linear or exponential, but it’s very difficult to argue that it won’t get better in the near term. On the other hand, arguing that we hit a plateau at this time is just ignoring reality.
This doesn't address anything I said in the above comment.