| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by slopinthebag 85 days ago
	What kind of software are people building where AI can just one shot tickets? Opus 4.6 and GPT 5.4 regularly fail when dealing with complicated issues for me.

4 comments

withinboredom 85 days ago

Not just complicated, but even simple ones if the current software is too “new” of a pattern they’ve never seen before or trained on.

link

slopinthebag 85 days ago

I dunno if Rust async or native platform API's which have existed for years count as new patterns, but if you throw even a small wrench in the works they really struggle. But that's expected really when you look at what the technology is - it's kind of insane we've even gotten to this point with what amounts to fancy autocomplete.

link

thin_carapace 85 days ago

i dont see anyone sane trusting ai to this degree any time soon, outside of web dev. the chances of this strategy failing are still well above acceptable margins for most software, and in safety critical instances it will be decades before standards allow for such adoption. anyway we are paying pennies on the dollar for compute at the moment - as soon as the gravy train stops rolling, all this intelligence will be out of access for most humans. unless some more efficient generalizable architecture is identified.

link

heavyset_go 85 days ago

> as soon as the gravy train stops rolling, all this intelligence will be out of access for most humans. unless some more efficient generalizable architecture is identified.

All Chinese labs have to do to tank the US economy is to release open-weight models that can run on relatively cheap hardware before AI companies see returns.

Maybe that's why AI companies are looking to IPO so soon, gotta cash out and leave retail investors and retirement funds holding the bag.

link

PeterStuer 85 days ago

They could still eliminate relatively cheap hardware.

link

thin_carapace 85 days ago

i was under the impression that we were approaching performance bottlenecks both with consumer GPU architecture and with this application of transformer architecture. if my impression is incorrect, then i agree it is feasible for china to tank the US economy that way (unless something else does it first)

link

heavyset_go 84 days ago

I think it just needs to be efficient or small enough for companies to deploy their own models on their hardware or cloud, for more inference providers to come out of the woodwork and compete on price, and/or for optimized models to run locally for users.

Regarding the latter, smaller models are really good for what they are (free) now, they'll run on a laptop's iGPU with LPDDR5/DDR5, and NPUs are getting there.

Even models that can fit in unified 64GB+ memory between CPU & iGPU aren't bad. Offloading to a real GPU is faster, but with the iGPU route you can buy cheaper SODIMM memory in larger quantities, still use it as unified memory, eventually use it with NPUs, all without using too much power or buying cards with expensive GDDR.

Qwen-3.5 locally is "good enough" for more than I expected, if that trend continues, I can see small deployable models eventually being viable & worthy competition, or at least being good enough that companies can run their own instead of exfiltrating their trade secrets to the worst people on the planet in real-time.

link

g947o 84 days ago

I mean, they have been doing that for at least a year, and I haven't seen signs of US economy tanking?... You need to find some better arguments

link

heavyset_go 83 days ago

There aren't any released open-weight models that are "good enough" yet, but Qwen-3.5 is getting really damn close to the point where more than half of my LLM usage gets routed to it.

I suspect, but don't know, some fields of inquiry will be fruitful when it comes to "good enough" small models. Especially when it comes to constrained tasks like software development. Software development models don't have to generalize to anything a chatbot can be asked or tasked with, the space it's required to generalize on is pretty small compared to literally the whole world.

If I was a betting man, I'd put my money where my mouth is, but I'm not. I am betting with my time and focus that smaller local models are worth it, and will be worth it, though.

link

slopinthebag 85 days ago

Even in webdev it rots your codebase unchecked. Although it's incredibly useful for generating UI components, which makes me a very happy webslopper indeed.

link

thin_carapace 85 days ago

im grateful to have never bothered learning web dev properly, it was enlightening witnessing chat gpt transform my ten second ms paint job into a functional user interface

link

m00x 85 days ago

Several fintechs like Block and Stripe are boasting thousands of AI-generated PRs with little to no human reviews.

Of course it's in the areas where it doesn't matter as much, like experiments, internal tooling, etc, but the CTOs will get greedy.

link

slopinthebag 85 days ago

I don't think anybody is doubting its ability to generate thousands of PR's though. And yes, it's usually in the stuff that should have been automated already regardless of AI or not.

link

sigseg1v 84 days ago

Depends on your circle. On HN I would argue that there are still a fair number of people that would be surprised to see what heavy organizational usage of AI actually looks like. On a non programming online group, of which I am a member of several, people still think that AI agents are the same as they were in mid 2025 and they can't answer "how many R's are in the following word:". Same thing even when chatting with my business owner friends. The majority of the public has no clue of the scale of recent advancement.

link

nerptastic 84 days ago

Not arguing, but I just prompted Opus with a made up word and it responded with this:

“There are 4 Rs in the word “burberrorrly.” Here they are highlighted: burberrorrly (positions 3, 6, 7, 9)”

Obviously not a real word, but perhaps the fundamental concept remains

link

thin_carapace 85 days ago

these companies contribute to swathes of the west's financial infrastructure, not quite safety critical but critical enough, insane to involve automation here to this degree

link

girvo 84 days ago

GPT 5.4 straight up just dies with broken API responses sometimes, let alone when it struggles with a even moderately complex task.

I still can't get a good mental model for when these things will work well and when they won't. Really does feel like gambling...

link

victorbjorklund 85 days ago

Of course not all tickets are complex. Last week I had to fix a ticket which was to display the update date on a blog post next to the publish date. Perfect use case for AI to one shot.

link

yrds96 84 days ago

I'm using Opus on Claude Code and even on easy tasks, if you not review the changes properly, it creates tech debts. One of the most common issues is replicating the same logic with variables with different names (which makes grep harder to detect on future changes) in multiple places and lack of following project patterns. Even having a lot of .md files instructing to do the opposite. I still didn't find a workflow without human interaction that can be that efficient and reliable.

link

nerptastic 84 days ago

I suppose at that point I’m wondering if it would have just been faster for… you, (I’m assuming) the developer to make that change and deploy it? Is the AI really faster on small changes like that, if you understand the platform/code/CI/CD enough???

Maybe for a non-dev it would be nice to submit a ticket and have it auto-fixed by an agent. But in the devs case, it feels like it would be faster to just do it manually.

link