Hacker News new | ask | show | jobs
by montroser 12 days ago
Well, my team does what we call vibe engineering.

You do ask hard questions up front, define boundaries, give lots of high level architectural guidance, declare interfaces, and bounds of abstraction... And then you ask the LLM to make it so, and it does. You give it the structure, and it fills in the implementation.

This is engineering, more or less.

4 comments

The rest of us who do similar work with the model to build richly define spec files. Spec driven development is a lost art with non-AI coding anyway. People who complain about these things either do not work with people using AI correctly and dont realize they just need to push for better standards or they are terrible at architecting. I notice everyone who hates AI just finds any excuse to dismiss it instead of actually pushing towards more effective ways of using it.

I would love to see how perfect their “organic” code looks because I wont be surprised if its full of all sorts of issues, all in prod, for years, never known or spotted, just to be found and fixed by Claude in 15 minutes, with unit testing to test and ensure no regression is introduced.

> using AI correct

There is no "correct" way to toss a coin. Some day people who are depending on LLMs blindly will understand that. All their notions of "correct use" is based on folk lore and...vibes, that is just maximizing token use under some misguided notion of "correctness"..

LLMs are not deterministic. They do typically behave within pretty reasonable boundaries. Humans are not deterministic. They also typically behave within pretty reasonable boundaries. Engineering with LLMs and humans means understanding those boundaries and designing for them. This is a legitimate engineering problem like any other. I think the main misalignment I see is the expected productivity gain. When you are using real engineering discipline it is still very productive to use AI for coding, but not nearly so productive as many people claim when you factor the fragility of their system.

There is no correct use. There is no “correct” way to build systems. There is principled and disciplined use.

>Humans are not deterministic.

There is a pretty big difference. If you ask a human "Is X true", and they says "yes", you can be 100% sure that they will always behave in a way that is logically consistent with X being true (talking about a competent and honest human being here, and when the implication is obvious). But by their very nature, there is no reason to assume the same with the LLMs.

Tbh you can treat Claude like a Junior developer and give it detailed feedback.
genuine question. if i have a tightly-defined unit test and Claude writes a blob of code that passes it, does it matter what's in the blob?
It matters for at least a few reasons:

- Depending on the nature of your application, it may be very important to be able to audit the business logic and intended behavior. For compliance reasons, for operational reasons, for moral/ethical reasons -- you very well might want to affirm what the code is actually trying to do.

- A coding agent may get very creative in order to write code that passes a tightly-defined unit test. It may come up with approaches that technically pass, but work against the overall intention of the app in the first place. This becomes an arms race rather than a productive collaboration, where the agent's increasing creativity has to be matched by a sprawling test suite.

- Eventually, inevitably, business requirements will change, and the blob will need to evolve. It will be much easier for an agent or a human alike to understand how to safely make the change, if the existing implementation is transparent and understandable.

Two possibilities:

1. Your unit tests are exacting enough to fully specify the unit. In that case, congratulations, your unit tests are the code. They're also probably much more awkward to write, maintain, etc. Also, the compilation step to go from the unit tests to the actual code is now orders of magnitude more expensive, requires a SaaS to even work, etc.

2. Your unit tests are not that exacting and still leave ambiguity, edge cases, etc. In that case it very much matters what's in the blob of code, because while it could be a correct implementation of what you wanted, it could also be something else entirely that just happens to be correct for the part you did specify.

If the test passes, you review the blob, and QA tests it. I dont see why its any different to you having copied code from StackOverflow.
It does not matter for one instance. But it does matter if you plan to make a living off it.
Who ensures it followed the specs?

The more context an LLM gets, the more likely it will start to ignore instructions.

If the LLM runs a context compression, all bets are off. There's a reason Anthropic upped the context to 1M tokens to reduce the chance of this from happening.

> Who ensures it followed the specs?

The human. But only if you care about verification.

The human is missing form OP's description. "and it fills in the implementation". No human in sight.

You can't call it "engineering" if you don't care about verification.

If you build a bridge, the engineers aren't the one doing the welding and crane operation and bolts and digging holes and whatnot.

They're the ones checking that work matches the plan.

Come on, now. The human writes the plan up front, which includes guidance on testing strategy, classes of tests, particular test cases to cover, etc. And just like normal, of course you don't just ship the code without doing manual verification, code review, auditing the test cases, and all the rest.
> Who ensures it followed the specs?

I mean, it's the same with building a bridge in the real world, right?

Someone has to check the work.

How do you do this? I really struggle to get the agents to follow my architectural invariants and coding conventions.

I use Cursor and Codex but the agents keep making regressions and breaking rules. They'll even take shortcuts sometimes, by doing things to make tests pass but with code that would be dangerous in prod.

Now, I use them file by file but it feels more like a typing assistant than something much more.

As of today they can't. You have to tell them what the new API looks like, which new classes they have to create and describe them in detail, etc... You have new projects that try to add good practices in the prompt [0] or audit your code once in a while [1] but it's not enough.

Right now they can be autonomous to finding bugs and inconsistencies. But not architecture or even just creating a long enough PR without any guidance and feedback.

[0] https://github.com/ChristopherKahler/carl

[1] https://github.com/ChristopherKahler/aegis

When your AI slides, make a permanent test that catches that particular slide. Then have it run all the tests every time it does something significant.

We have as much test code as deployable code because the AI keeps finding ways to do what we told it to, but not what we meant.

This is engineering, more or less.

People who build bridges for a living shake their heads in dismay.

At least bridges come in the realm of unchanging physics and unchanging material behavior. There is only so much variety in building bridges.

Software on the other hand...

You see, the difference is that with building bridges, there is no value in building a "Toy" bridge that does not require any real knowledge. But even toy software can bring huge value. But that does not mean it does not require engineering discipline to build non-toy software.

Software engineering is not about learning libraries or tools. It is the art and science of managing complexity under constant change.

They already were well before LLMs.

    myFramework new myCoolStartup
    myFramework generate dataModel
    myFramework generate controllerForModel
    myPackageManager install coolViewWidgets
    # insert glue code I learned on Youtube here
    git push coolPaaS myBranch
If LLMs doesn't fix things, why are we spending trillions of dollars boiling the ocean?
The bridge architects and engineers are not the ones hammering in the nails.
Clearly you have never worked in heavy industry, or you would know that the word "build" is used at all levels, all the way up to architect and real estate development level. Example:

https://www.buildordie.com/ (Build or Die is the web site for a mid-sized architecture firm.)

Before AI we have seen drastic drops in software quality across the board, even Windows has been going downhill for several major versions now.
What's your point?

You seem to be implying that since the current wave of AI started that things have gotten better. That is demonstrably, repeatedly, and completely false. Just cruise the HN front page and watch the AI fails scroll by.

That you point to Windows getting bad over the years, and the fact that it continues to get worse with the full AI buy-in of Microsoft, shows that AI is not some magical software savior.

When something is wrong everyone complains, when nothing is wrong you rarely hear a peep. I've either shipped to production or helped others ship "AI Slop" code as would be blindly described by others, despite me reviewing it and testing it. I've first hand seen AI-first greenfield projects go into production and help small businesses achieve more sales and success, heck I reviewed such code for a relative who is now hiring developers and lets them AI code so long as they review, because it gave him something no software company in his market would offer.