| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by eithed 51 days ago
	What I find fascinating that there is so little substance in this article about the quality of produced code and the medium. Is the code documented and tested? Is it understandable and extendable? Is it secure? What language, framework, database was used? Author mentions judgement and taste - well, is the code tasteful? Will the model rearchitecture the entire thing if I ask it to add new functionality, spending another 9.5h in tokens? I assume that the research part is domain knowledge = how different types of travel translate to time making it presentable; how did the author verify this? These questions are even not about AI: if I were to give money to a human agency and were given something they tell me works, I would ask the same questions. If I did not know how to evaluate, I would hire people that do. With LLMs the verification part is what bothers me the most.

17 comments

an0malous 51 days ago

These posts are never written by software engineers, it’s always some tech exec, retired engineer, or VC. This author is apparently a professor at the Wharton School of Management? None of these people have to ship or maintain real products, they’re just making side projects.

The only decent software engineering perspective I’ve seen has been from Mitchell Hashimoto.

jimbokun 51 days ago

Well that’s kind of the point.

They can just summon bespoke software out of the ether that only handles the use cases of themselves and a few of their collaborators.

Making “side projects” was mot possible for non-developers before powerful LLMs. Now it is.

an0malous 51 days ago

I don’t think that’s true, I think these authors are making a much stronger claim that AI is proficient or even an expert at software engineering. This author describes how complex and sophisticated their software is, and the only value he’ll concede to “coders” is that there might be a few bugs they’d need to fix.

Imagine not being an architect and using Claude to put together a building plan, then concluding it’s basically done but we might need a real architect to double check the measurements. It may even be true but I’d be skeptical if it’s always non-architects saying this.

21asdffdsa12 50 days ago

And - we kind of have been here before. The "proto"-type is almost complete. Its just a little slow, a little spaghettificated, just written in excel-vb, clicked together in node-graphs, or the next hot thing that makes coding unnecessary.

bathtub365 51 days ago

Why do they even need coders to fix these bugs? It would be an order of magnitude (at least) to ask Claude to find and fix them, and it will likely be successful.

Building in the physical world has physical and time constraints that cannot be overcome, which is one of the reasons architecture (and engineering) are so important in this domain. In software development these constraints were only inherent when people were writing the majority of the software. I feel like I’m seeing what I thought were fundamental constraints being eroded by the increasing speed and correctness of these tools and it’s making me reconsider the importance of some of the values that are held by software engineering.

It’s obviously dependent on the domain and solution, but if your software can be extremely rapidly rearranged, bugs found and fixed with little effort, and features added with only a minimum prompt, I think the entire definition of technical debt has changed. I’ve been sceptical of these tools and still approach their output with caution. I also worry that, as a software developer, if more can be accomplished in less time there will be less room on this planet for software developers.

phil21 51 days ago

> I think the entire definition of technical debt has changed. I’ve been sceptical of these tools and still approach their output with caution.

This very well summarizes my current thinking on the subject as well. And most of my career has been playing the role of technical debt nazi. Much to the detriment of my earning potential.

Does AI make incredibly inefficient code most of the time? Yup. But it does it at lightspeed with minimal effort.

I think many software engineers forget they exist to get real things done (in many cases at least) and they are a cost center for most businesses. If your end product is not selling software, very few people actually Doing the Thing(tm) will give a single solitary care about code quality or maintainability when they can just spend 30 minutes and $15 worth of tokens to fix it.

It won't take over everything, but I've already seen otherwise very intelligent go-getter type folks who are not technical or know how to code made extremely useful things for themselves and their small little enterprises. And this will seemingly only get better and more efficient.

For someone who really does love the idea of well architected and future-proof code this is just icky to even say or consider. But I'm coming around to this is the future for the majority of software for most places. And it may have the ability to seriously even the playing field for small enterprises in some industries.

I'm currently using it to implement a zillion side projects at home I've been "meaning to get to" for years. It makes incredibly silly unmaintainable code most of the time - but I learned to not care, and just tell the AI bot to fix it/add to it as I go along. Worst-case I spend a single night deleting it all and starting from zero to "refactor" an entire thing.

prmoustache 50 days ago

> I think many software engineers forget they exist to get real things done (in many cases at least) and they are a cost center for most businesses. If your end product is not selling software, very few people actually Doing the Thing(tm) will give a single solitary care about code quality or maintainability when they can just spend 30 minutes and $15 worth of tokens to fix it.

I am suprised to hear people so naive they expect their token usage to stay flat if code quality and maintainability starts falling exponentially?

What if to fix 2 bugs your LLM starts adding 50 new ones? Will you tell your customers in supports channel "sorry software is finished, if we try fixing anything, everything else might break, not worth it". Or "we can probably fix it, but our AI usage will raise so much we need to up the subscription 3 fold, you choose".

The speed at which LLM codes is only comparable to the speed at which they add garbage to your repo. If you stop caring about maintainability, you also stops caring about your AI/LLM related bills and the viability of your project past the PoC stage.

senordevnyc 50 days ago

I think many software engineers forget they exist to get real things done

One billion percent. I think the vast majority of the anti-AI sentiments I hear from software engineers comes down to them caring more about playing with their tools than actually solving the problem.

locknitpicker 50 days ago

> Does AI make incredibly inefficient code most of the time? Yup. But it does it at lightspeed with minimal effort.

This hits the nail in the head.

Detractors often hang on to examples of coding assistants making mistakes or output subpar code, but they somehow miss the fact that coding assistants can also be prompted again and refactor whole swaths of code just as fast as they introduce oopsies. This means that the worst case scenario implies fast convergence to an acceptable outcome, and from there also fast iteration to improve upon that.

dv_dt 50 days ago

It's quick to build a hut in a green field, but slow to remodel the expanded building after. I think that will remain true regardless of if a team of sw developers are doing it, or an AI with a product manager or somewhere in between.

mitxela 50 days ago

Technical debt remains the same. LLMs are found not to work as well when editing messy codebases - exactly the kind you get after using an LLM for a while. After a few weeks or months you have to either throw it away and start over, or involve a human at exorbitant prices.

squidbeak 50 days ago

> I think these authors are making a much stronger claim that AI is proficient or even an expert at software engineering.

The author specifically says:

> I am sure it is not perfect (I only spent an hour working with the results), but a software engineer would iron out the remaining potential bugs that I could not find quickly (which is one reason we may need more, not less, coders in the future, to help with the explosion of new uses for software)

which acknowledges pretty clearly that engineers bring a level of insight and experience still missing from Mythos. Saying that, I totally disagree with his contention that this will always be true. It's pretty weird that the author of an article stressing the steep improvements in a model's capability can't seem to imagine further improvements in that capability. As if Mythos is where development ends or whatever gap remains between models and experts won't steadily narrow or eventually widen in reverse.

SpicyLemonZest 51 days ago

It is, and it's cool that it is, but the calibration is important. Statements like this:

> With Fable the spell has gotten powerful enough that I am no longer sure I am the wizard. I am closer to a patron. I describe what I want, I pay for it, and I judge the result. The conjuring happens somewhere I cannot watch, in hundreds of small choices I never get a vote on. The work has shifted from process to outcome. I no longer steer; I commission.

have a very different meaning coming from a non-technical researcher than they would from someone who builds software for a living.

shimman 51 days ago

Making side projects isn't a trillion dollar industry tho, adding to the fact that we are facing another global supply chain crisis due to the Iran War; the US is about to commit the biggest self-own ever in the history of empire.

Schmerika 50 days ago

There are actually quite a few trillion dollar industries that exist thanks to "side projects".

Apple was Woz's side project, once upon a time. Adsense came from Google's 20% time. Social media started as a side project.

Forests grow from trees. Trees grow from seeds. More potential seeds = more potential forests.

queenkjuul 50 days ago

All the undiscovered Woz's of the world add up to a trillion dollars? There's $1T of money out there waiting to be spent on side projects?

The question was "are side projects a trillion dollar industry" not "has a side project ever started an industry"

How much of a new $1T software product will anthropic capture in token costs, anyway?

zelphirkalt 50 days ago

The US has been on a course of self-owns ever since Trump got into office. That they still are a dominant power on the globe shows how much they were one before Trump, but it seems to be changing. At every self-own they commit, China laughs and inches up a little closer. I think we will see the day, when they are evenly matched in our lifetimes.

But which self-own exactly do you mean, of the many there are?

bandrami 50 days ago

Well, right, but if the real use case for LLMs is "making software that wasn't economical to make before" that's bearish for the labs because it means they're only going to be chasing the low end of the market.

neilv 50 days ago

Relevant quote:

> I am sure it is not perfect (I only spent an hour working with the results), but a software engineer would iron out the remaining potential bugs that I could not find quickly [...]

People have said things like this many times in the past, and, in the past (perhaps not now), it's always been a misunderstanding of what is good and bad, what's difficult and easy.

For example, someone would draw a UI in a GUI painter that generates code (or a resource file), and a manager would see it and think the majority of the work towards the product is done. (Incidentally, then there seemed to be a reaction, towards making your UI mockups look abstract or otherwise different from runnable code, helping the nontechical to understand that this isn't 90% of the finished product.)

Or a student intern hacks out a homework-grade demo, and a manager who understands neither software engineering nor product domain says "we just need some engineers to polish it up for production", and thinks the student is a star and why can't their engineers be as brilliant and productive. (I might have once been that energetic intern, who was happy for the encouragement, but then learned more, and saw it was a thing.)

This common misunderstanding was sometimes self-correcting -- when trying to ship became a disaster of misery and regretted-attrition, or the product was poorly received by the market because it wasn't thought through nor implemented well, or building subsequent functionality atop it was a nightmare. (But adverse effects of bad approaches is one of the reasons for management and ICs to job-hop, before the unwanted effects affect them personally.)

What might be different now is that some of these AI tools are outputting better-engineered work than some software engineers, and much faster.

At the back of my mind, I'm wondering how the really great software engineers will continue to stand out, as the discipline is being devalued in the minds of most leadership, and anyone can prompt an AI to generate something that superficially appears to them like what they assume a great software engineer would produce. (Even if the great engineer would do much better quality of implementation, have innovative ideas that ML from open source code would not, and maybe arrive at better product concepts as they worked through the problems.)

cgearhart 51 days ago

I’m starting to realize that LLMs are really good at building low-stakes projects. Your questions mostly presume that the stakes are higher. The software will last a long time; the requirements will evolve; we can’t tolerate mistakes; etc.

The trick to getting good at using LLMs for software is to learn how to make _all_ projects low-stakes.

qaq 51 days ago

You don't need LLM for that. You make _all_ projects low-stakes by working on green field project using (insert buzzword soup of the day) and leaving for a new green field opportunity (that requires experience with buzzword soup of the day) before the project ships.

DrJokepu 51 days ago

No, what you’re describing still requires you to do some actual work, and also, while you work there, there is still some level of accountability. A much, much better grift is coaching.

Like, an AI coaching session for executives at the yearly executive retreat. You show up, spend a few hours going through some nonsense slides ChatGPT put together for you, you charge an eye watering fee for it, HR or whoever organizes it will gladly pay for it because it will make them look all cutting edge in front of the CEO, by the next day everyone will forget about it. No accountability at all!

majormajor 50 days ago

In the LLM world you never get a chance to get paid to work on those greenfield projects because the person with the idea is churning the prototyping and discovery work themselves.

If you want to get paid to work on software, you get involved after its found success and the stakes get higher.

(Which assumes there are still significant areas where economies of scale reward that vs everybody just having their own DIY version of everything.)

owlbite 50 days ago

Or economies of liability and buck passing. I suspect managers and businesses will still want to be in the game of "not my fault, supplier is working on it, we can sue them if they don't meet SLA".

mcv 50 days ago

You've got to be the person with the idea. I'm currently doing that. I spent the past year working on a frustrating project where everybody else did everything wrong, so now I'm building it on my own, hoping to sell it to them. (No idea if that will work)

rpdillon 51 days ago

This is really insightful, but I think it also extends to making the project either low stakes or low complexity. I have this lurking feeling that the preferable architecture for software will change as a result of LLMs because they're good at working on low complexity modular components more than they are on high complexity million-line code bases.

ncruces 51 days ago

You'll just shift complexity to the orchestration of the modular components.

Monoliths vs micro-services.

rpdillon 50 days ago

There's some truth here. But in carefully orchestrated scenarios (the minority, to be sure), it can work surprisingly well, I think.

majormajor 50 days ago

They aren't necessarily as great at building low-complexity high-modularity components, though. ;)

Unless you know enough to tell them to! And keep them honest about it...

dchftcs 51 days ago

If there's a viable way to make all projects low-stakes we'd have done it. Consider this: microservices.

acedTrex 51 days ago

> The trick to getting good at using LLMs for software is to learn how to make _all_ projects low-stakes.

this doesn't really work in the real world. There are many things that actually matter, engineering is fundamentally about handling them.

skywhopper 50 days ago

But not all projects can be low stakes. None of the important ones are.

spicyusername 51 days ago

    the quality of produced code and the medium

A thought I have been tossing around in my head as the models get better is that it really may not matter what the code looks like.

If the observed behavior of the software is good, then the software is good. If a bug, of whatever kind, can be fixed by a model on a vibe-coded codebase, then that's a fixable bug. If there are no exploitable vulnerabilities, then the code is secure. If the performance is adequate, then the code is performant.

It simply does not matter what the code looks like if, from the outside, it does what its supposed to, and, from the inside, a model can fix the issue if one is found.

More than ever, software engineering is now really a job about making sure the code is doing what its supposed to.

And even if it DOES matter what the code looks like, you can have a model fix that too.

skydhash 51 days ago

The thing is that a lot of code rely on multiple layers of abstractions with their own correctness and failure states. And then you overlay the domain correctness and failure cases on top of that.

But all of those correctness are imaginary. The hardware only enforce a few (and it may be buggy). The OS adds some more (and it’s buggy). The compiler/interpreter may have bugs (but that’s rarely a nuisance) and the libraries are often brittle. There are cracks everywhere in the tower of abstractions.

The code has never mattered. What has always mattered is the knowledge of what is the model of correctness of the software (programming as a theory by NauR), so that you can discern where a program is wrong.

The thing is a crash or some other immediate errors are actually nice to have. You get to react immediately and can have a core dump or a stacktrace that points you the error. What is truly a terror is silent corruption (wrong order of operations, wrong values for a comparison that has expanded the idea of correctness, security issues that has been backdoored for years,…).

As Hoare said:

  There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.
  The first method is far more difficult.

LLM are very much the second kind. You write a lot of complicated code, and then you can no longer reason about their correctness.

gofreddygo 50 days ago

> There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.

That is so real. Brilliant !

xnx 50 days ago

Source: https://www.goodreads.com/quotes/21638-there-are-two-ways-of...

eithed 50 days ago

Don't forget that LLMs are trained on human code. If they cannot understand what your code does then they cannot make changes to it, or at least - having them understand your codebase becomes expensive (more trips to Anthropic servers)

coldtea 51 days ago

>What I find fascinating that there is so little substance in this article about the quality of produced code and the medium.

I clicked one of his examples intrigued "a snake game where the snake is self-aware and crazy things happen;". Played for 1-2 minutes, and it's the classic 1980s snake game. Am I missing something? What is "self-aware" about it? Some funny messages at the bottom of the screen? And what are the "crazy things"?

starshadowx2 51 days ago

It sounds like you either didn't play enough or you are missing the new mechanics that get added over time. There's definitely more to it than just regular snake.

vunderba 51 days ago

I had the exact same thought. To me, it feels like they just took the fairly common “sentient video game character” trope and bolted it onto a very conventional snake game.

I will say, the act of eating creates a "bulge distortion" that flows down the length of the snake is a nice touch though.

kesor 51 days ago

You didn't play long enough. There are layers and layers and layers of features in that game if you play for 10 minutes or more.

nozzlegear 51 days ago

Can you spoil it for us?

soraminazuki 51 days ago

Welcome to every LLM discussion in the past 2 years or so. When asked for anything of substance, we're faced with a barrage of "but humans aren't good at this too!" Very few quantifiable evidence and lots of pure rhetoric.

skydhash 51 days ago

I’ve seen this pattern again and again, and I don’t bother replying. There’s also the “strong statement, and when you contradict it, they point out some particular circumstances that no one cares about”.

munksbeer 50 days ago

I think a lot of us have stopped talking to each other about this. I see it the other way round to you. I see constant scepticism and doubt that LLMs can build anything useful, and whenever provided with examples, the goalposts just move.

And at my own firm, I think every developer is generating most of their code using agentic coding. We're still sceptical enough that we are doing the usual heavy handed human review process, so we're not seeing a huge speed up in delivery times, but we are seeing a volume increase. That is because writing the changes and raising the PRs is much faster, but also a lot of boring admin and support work is now mostly done by LLMs. Reports of instability, vague client requests, etc? Throw the LLM at them and it usually figure it out why I continue to engineer.

So I know, first hand, that these things are very good. I also know second and third hand that pretty much every fintech in the industry is as heavily using agentic coding as we are.

And then I come to HN or reddit and I see people telling us that they cannot write decent production code, and this is just wrong. This isn't opinion wrong, it is objectively wrong. Any fintech that wants to keep up will tell you this.

I can't speak for other industries but I can't imagine they're different.

So, I'm not sure what to conclude from this. I don't want to be uncharitable, but when HN/reddit posts just don't match the reality I see for myself, I have no choice but to categorise them as being emotionally driven to stick to a particular narrative, and so I can dismiss them.

teliosix 50 days ago

It is all the same narratives from around the invention of the power loom if you look into it.

What I take from that time also is that the hand loom weavers were not incorrect. The power loom did not do as good of a job as they did by hand.

You can still by a hand woven shirt today at a premium price.

There is a category error as if quality is the product as opposed to one input of the product.

You probably don't get to be a master craftsman without that quality mindset so they aren't wrong but missing the forest for the trees.

queenkjuul 50 days ago

I use Claude Code at a fintech, and I'm seeing garbage PRs from careless coworkers all the time. I'm having to correct Claude output regularly.

Yes, it does nearly all the typing for me now. But left to its own devices, it'll happily spit out awful code.

skydhash 50 days ago

> I see constant scepticism and doubt that LLMs can build anything useful, and whenever provided with examples, the goalposts just move.

> I see people telling us that they cannot write decent production code, and this is just wrong.

At least for me, that has never been the counterpoint that I’ve been making. I’ve never cared about code itself, especially with languages like Java and Kotlin, where you basically autocomplete most of the code, and with SDK like ios where you can collect snippets for most of the patterns that you need. And with frameworks like Laravel, where most big additions are done with the tooling. And because code is so repetitive, editors like emacs and vim have lots of features and plugins to help with copying and pasting (registers, macros, navigation, snippets,…)

And the fact is some code you wrote today will be worthless tomorrow and will be replaced and deleted. So, it’s very rare to care about some particular snippets or patch of code.

What myself, and others, have been complaining about is the quality of the codebase and the sustainability of the practice. Especially with the associated claims about increased productivity.

I care about correctness. Simplicity and reduced amount of code increase my confidence that I can achieve it. New features, until tested in production, are more probable to decrease the reliability of the software. And with each fix for a bug, I need to make sure that I’m not adding five more.

To this day, I’ve not seen any compelling arguments that is about writing better code reliably. I’ve seen a lot about writing more code. It’s like manager thinking if you’re not at your computer typing, you’re not working.

> We're still sceptical enough that we are doing the usual heavy handed human review process, so we're not seeing a huge speed up in delivery times, but we are seeing a volume increase

Are you seeing a quality increase? Less customer bugs, less outages, faster resolution? Are you measuring those?

munksbeer 50 days ago

> Are you seeing a quality increase? Less customer bugs, less outages, faster resolution? Are you measuring those?

We're not at the stage to measure yet. We may be behind others, not sure. Actually, this isn't quite true. I was interested, so a created an ad-hoc report (with AI) on PRs landed per week over time. This has gone up over the last 6 momths. But that is hard to say why that is. It might just be people are raising smaller PRs because it becomes easy to have the AI split things up, while before, people were too lazy to do this.

Our bottleneck is still that we want humans to review. Sometimes we spot errors, but our pre-existing testing frameworks are very robust already, so if these pass, we're very confident to release to production, and the agent is excellent at understanding the existing testing frameworks and adding to them for new stuff.

So in our team, we don't often see blatant logic errors. It is mostly to do with things like using a pattern that is used elsewhere in the codebase (or not at all) and doesn't belong in our specific section of the code (we have a large monorepo). These become fewer as we enhance our ruleset (AGENTS.md or CLAUDE.md) for our particular developers.

skydhash 50 days ago

> And then I come to HN or reddit and I see people telling us that they cannot write decent production code, and this is just wrong. This isn't opinion wrong, it is objectively wrong

So how can you justify this comment of yours from your reply if you’re not measuring anything? Mind you, I can easily get good results from AI tools, but I don’t like the experience and the code is often over-engineered and drifts away from my target architecture.

But the worst is quickly loosing sight of the tiny technical details that matters when solving bugs or altering features. I don’t like typing code. What I like is to be able to go directly to the code that I need to change, modify it, and then verify that it works. Most of my time is spent deep thinking about the design of the software which is orthogonal to code.

And if there is one thing that is common about people fully onboard with LLM is that they can talk about the product, but they can’t argue about its behavior and its correctness. There’s no intrinsic model that they can compare with the real code. They don’t know the edge cases, the technical pitfalls, how the software will react if you modify one component. Any brainstorming session quickly turns into a slog because they cannot contrast approaches anymore. You can see the decay of understanding in realtime.

viking123 50 days ago

Yeah, never concrete examples from these guys.

I am creating a game and I can say that with the coding part the models help a lot, mostly gpt 5.5 high. Tbh to me all the frontier models feel the same and they can all solve the stuff I do quite well with some guidance and prompting. But that kind of makes me appreciate the other stuff more like visual style, sound design, mechanics etc etc. Tons of work still.

For brainstorming I find the models bad nowadays or maybe I am just too critical of the results

hypfer 51 days ago

Being the first to release an article gives you great SEO or whatever. Doing the things you've mentioned takes time.

jstummbillig 51 days ago

Less fascinating when you consider that this is a non-coders perspective.

CobrastanJorji 50 days ago

It's still fascinating, but for a different reason. The "Concord" tool that got created bills itself as "Instrument-grade measurement of qualitative text. Explore in minutes, publish with honest statistics." Instrument-grade! How wonderful! That presumably means its accuracy has been ensured, and it's been carefully calibrated, right? What, nobody's ever measured or even examined the code? Well, no matter, let's go ahead and publish it and advertise it as "honest" "instrument-grade measurements."

reedlaw 50 days ago

Yeah, the README looks like slop to me.

eithed 51 days ago

Fair enough, but enterpreunership should, I guess, ask questions if given Next Big Thing has substance behind it or is it just snake oil.

munk-a 51 days ago

Ah, but billions of dollars depend on those questions not being asked in a genuine manner. Don't you want a slice of that or are you an... AI skeptic thunder clashes.

unholiness 51 days ago

Yeah, this made it basically clickbait for me, in terms of time I wasted with the wrong expectation.

The lack of downvotes on posts on HN has always felt like more of a bug than a feature to me.

nomel 51 days ago

So, the perspective of the one that gains the most, that will value this the most, and that will pay the most? ;)

andai 50 days ago

These days it's uneconomical for human to verify AI generated code. So we ask the AI to do it. Like when we asked the FBI to audit itself and they found no problems :)

chickensong 51 days ago

You probably don't care about the ingredients or engineering of asphalt, only if the road does its job well or is filled with potholes. Outside of the software industry, nobody gives a shit about code or databases.

geraneum 50 days ago

> You probably don't care about the ingredients or engineering of asphalt

Everyone does. You don’t think about it everyday because we’ve delegated it to experts which don’t come up with a new composition of Asphalt every time you press “generate”. It’s rigorously battle tested and short of intentional negligence, it’s consistent. I’m amazed how people are forgetting how the world actually works.

eithed 50 days ago

Exactly - the normalization of craft (?) is interesting

chickensong 50 days ago

You've missed the point.

geraneum 50 days ago

The point doesn’t seem to have been thought through.

munksbeer 50 days ago

The point is, if road engineers changed their process and materials, and to you it felt like driving on the same road, with the same wear and tear and potholes, you wouldn't even notice.

If AIs can generate code that looks ridiculous to humans but over time has the correct performance, the correct behaviour, no-one outside of software engineers will know or care.

skydhash 50 days ago

> The point is, if road engineers changed their process and materials,

They do those in labs, and then studies are made to prove that it can replace the current composition. They do not invent those on the spot and let the drivers QA the road.

> If AIs can generate code that looks ridiculous to humans but over time has the correct performance, the correct behaviour

It’s on you to prove that this big “if” can be realized. A -> B only matters when A is true.

mitxela 50 days ago

But they don't. LLMs can't understand messy code much better than humans can. Maybe a little, but not enough to compensate for the code they create being messy.

eithed 51 days ago

I agree. But if I'm paying for the road (even as a taxpayer) I get angry that after a year it's full of potholes and that there are unnecessary signs warning about penguin crossing, making it cost 2 times more than it should have (and dont get me started why this road is really a highway leading to my house). I'd want certain qualities. And this article is basically = you will get a road, built quickly

But yes, you are right - I don't build roads and don't know what is a price to build a road and how to determine the quality of correctly built one, nor I will ever care or learn.

aix1 50 days ago

> And this article is basically = you will get a road, built quickly

That's not how I am reading it. You will get a road built exactly to your spec, quickly. So no penguin crossings unless you ask for them.

I am also not entirely sure how the pothole argument translates.

eithed 50 days ago

The road will be built to some specs, including features nobody asked for. If the corpus was trained for roads built in Arctic, you will get penguin crossings.

fwip 51 days ago

Sure, but if there's a trillion dollar company saying that it's going to replace all our road workers or engineers - I'd want to listen to the opinion of an expert. Some reporter from CNN driving over it like "yeah seems good to me, good this" has approximately zero persuasive power to me.

Tylerian 51 days ago

The ingredients and composition of the tarmac is the difference between having the road full of pot holes after a week of use

queenkjuul 50 days ago

I care that the engineer followed industry standard best practices and used high quality asphalt. How could i not care about that? How do you think potholes aren't related to the engineering of asphalt?

jknoepfler 50 days ago

There also isn't any meaningful articulation of why this is a "leap forward"... literally everything claimed in the article has been claimed in the same breathless tones in articles written a year prior.

I get that there's little sense in arguing with the MBA hivemind, but... c'mon.

I manage two teams of highly motivated, largely pro-AI engineers. Both teams have independently concluded that they needed to ramp down GenAI usage because of code quality / maintainability concerns. Both teams have suffered from protracted outages caused by LLM jank not being sufficiently fenced off and guarded against. Both teams have expressed concern that the code generated by LLMs is far too verbose, full of slop, and rapidly becomes an unmaintainable mess.

These are teams that are building non-trivial LLM solutions (deep agentic data synthesis and multi-modal data tagging). They are using the technology creatively and pro-actively, not just vibe-coding slop and throwing their hands up when it fails. Both teams will continue using GenAI coding agents, don't get me wrong - but the gains are incremental, not transformative, and need careful fencing to make sustainable.

Nothing in these articles resonates as real. People who work in reality don't agree. I don't understand why this shit keeps getting attention (or rather I do, but the reasons aren't good).

markoloko 51 days ago

So would you be more comfortable if the user them just prompted the AI to use a specific language, framework and database. Aren't we all just going to reddit and finding out what all goes best with what? But also I don't trust nothing from it, even though I've seen it.

jimbokun 51 days ago

Does it matter to the people requesting the software if it acts in the way they expect?

crystal_revenge 51 days ago

We've lived in a software bubble for so long, most software engineers have completely forgotten that the purpose of (most) software is to solve a problem. If that problem solves the problem well and reliably it doesn't matter the quality of the code.

In fact, that's the entire reason we care about "quality code", because we assume that quality code is code that does what you expect well and consistently.

I say this as someone who hand writes code pretty much every night for fun, just to experiment with computation. Which, oddly, is more fun than ever because I don't feel like there's any need to connect this type of programming with "real world software", and I can really enjoy code for it's own sake, meanwhile my job is mostly just running agent loops (which I quite like as well).

SpicyLemonZest 51 days ago

I haven't forgotten that, I affirmatively think it's false. High quality code is necessary to solve problems reliably. Perhaps some people call things code quality when they don't matter (I really don't care what most variables are named), but there have always been teams who try to increase velocity by disregarding code quality, and from what I've seen AI does not stop them from shipping outages constantly.

munksbeer 50 days ago

Exactly. Quality of code is a programming invention to make it easier to write and maintain correctly functioning applications.

That is the entire purpose of "quality of code".

If the end user experiences a correctly performing application, now, and in the future, they don't care at all what the code looks like.

AIs could resort to a single global array of primitives and forget all about functions, and just use gotos if it helped them (it probably doesn't).

Anamon 46 days ago

That is only true for one-shot applications, though, whether written by human or machine. The reason we care about code quality is because rarely we don't have to look at code again after we first wrote it. Poor code quality makes maintenance and extension more difficult and expensive -- again, regardless of the degree of LLM support.

At least for human-written code, there's usually a thought and concept to be discovered underneath. For LLMs, one-shotting is all they know, and getting them to consider months or years of expanding and changing requirements will quickly turn into an impossible game of Twister.

munksbeer 45 days ago

I doubt this will turn out to be true. That seems trivially easy to solve for with correct training.

eithed 51 days ago

True, but you should say that about every thing. Does it matter to you how the car drives, as long as it takes you to your destination? Well, yes, it matters: how will it deal with a crash, and if it's possible to replace a part and if anybody can just open it if you leave it outside. I will be amazed if somebody shows me their home-printed car, but if they'll try to sell it to me like a new one...

sexylinux 50 days ago

It still does make errors, yes? Because it is not usable, if we need to verify everything. AI is only interesting if it can do things that humans can not do. If you can verify results because you can do it yourself, then why use AI? It will just bind highly skilled people to do verification work. Instead these people should do the actual work, results will come quicker.

So AI is only interesting to you / your org / humans if it can do things that you can not achieve. But if it still does errors, how could we ever know that super-invention by AI is not wrong?

If we can not rely on the correctness of the result, it is not usable at all. AI must create reliable and correct results always. That was a very fundamental requirement for computing. This problem has not been solved.

fisf 50 days ago

By that measure, most software developers should be unemployed.

danlugo92 50 days ago

You can either adapt or survive man, coping and negation dont help, AI is here to stay and yes it does require pilots but this map would have taken you weeks to do, the AI did it in 10 hours, you can still dedicate a week to refactor.

Also this is easily solved by .md spec files, this whole "bad code" cope is just FUD'

Anamon 46 days ago

I don't think that putting a text file saying "don't make mistakes" is going to get LLM output to the point where it doesn't need professional input, guidance, review and refinement anymore. They don't make these systems more deterministic. There have even been study results showing spec files reducing prompt adherence.

grafporno 51 days ago

It's an ad.

otabdeveloper4 50 days ago

Don't harsh my vibes, man.

adamtaylor_13 51 days ago

I'm becoming more convinced these are questions of the Before Times. Yes, yes—heresy, I know.

Yet, I can't deny the reality that I observe working with LLMs every day. If this truly is a step-function (as some are sgguesting), then I have absolutely zero concern for the quality of the code.

fwip 51 days ago

Kind of a circular argument, isn't it? "Some people are saying it's very good at coding. If that's true, I don't care if the code is good."

adamtaylor_13 50 days ago

I didn't say I don't care if the code is good.

I said I had zero concern for the quality of the code. That is, I do not have concern that the quality of the code will be a concern in and of itself.

It's a subtle, but IMO important difference. We only care about code quality so as it gives us stable, understandable systems. Historically that meant a human had to read and understand it. Suppose a future where that's no longer the case, then we may still end up with stable, understandable systems without understanding every minutiae of the substrate. It's the same way I don't really know if my compiler is correct, but the behavioral patterns of my code suggest it is without me understanding anything about its code quality.