| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by onion2k 98 days ago
	Models aren't deterministic - every time you would try to re-apply you'd likely get different output (without feeding the current code into the re-apply and let it just recommend changes) If the result is always provably correct it doesn't matter whether or not it's different at the code level. People interested in systems like this believe that the outcome of what the code does is infinity more important than the code itself.

8 comments

sensanaty 98 days ago

That if at the beginning of your sentence is doing a whole lot of work. Indeed, if we could formally and provably (another extremely loaded word) generate good code that'd be one thing, but proving correctness is one of those basically impossible tasks.

xpe 98 days ago

> but proving correctness is one of those basically impossible tasks.

To aim for a meeting of the minds... Would you help me out and unpack what you mean so there is less ambiguity? This might be minor terminological confusion. It is possible we have different takes, though -- that's what I'm trying to figure out.

There are at least two senses of 'correctness' that people sometimes mean: (a) correctness relative to a formal spec: this is expensive but doable*; (b) confidence that a spec matches human intent: IMO, usually a messy decision involving governance, organizational priorities, and resource constraints.

Sometimes people refer to software correctness problems in a very general sense, but I find it hard to parse those. I'm familiar with particular theoretical results such as Rice's theorem and the halting problem that pertain to arbitrary programs.

* With tools like {Lean, Dafny, Verus, Coq} and in projects like {CompCert, sel4}.

dsr_ 98 days ago

Let's rephrase:

Since nobody involved actually cares whether the code works or not, it doesn't matter whether it's a different wrong thing each time.

brabel 98 days ago

You got it completely backwards. The claim is that if the code does exactly what the spec says (which generated tests are supposed to "prove") then the actual code does not matter, even if it's different each time.

ModernMech 98 days ago

The point they are making is the tests are neither necessary nor sufficient alone to prove the code does exactly what the spec says. Looking at the tests isn't enough to prove anything; as an extreme example, if no one involved looks at the code, then the tests can just be static always passing and you wouldn't know either way whether or not the code matches the spec or not.

If anyone cared enough they could look at the code and see the problem immediately and with little effort, but we're encouraging a world where no one cares enough to put even that baseline effort because *gestures at* the tests are passing. Who cares how wrong the code is and in what ways if all the lights are green?

tomtomtom777 98 days ago

> If the result is always provably correct it doesn't matter whether or not it's different at the code level. People interested in systems like this believe that the outcome of what the code does is infinity more important than the code itself.

If the spec is so complete that it covers everything, you might as well write the code.

The benefit of writing a spec and having the LLM code it, is that the LLM will fill in a lot of blanks. And it is this filling in of blanks that is non-deterministic.

pjmlp 98 days ago

> If the spec is so complete that it covers everything, you might as well write the code.

Welcome to the usual offshoring experience.

SpaceNoodled 98 days ago

That's a huge "if."

gentooflux 98 days ago

I usually invert those to reduce nesting

FrankRay78 98 days ago

Sure, but where are the formal acceptance tests to validate against?

0-_-0 98 days ago

Besides, you can deterministically generate bad code, and not deterministically generate good code.

__loam 98 days ago

The code is what the code does.

kennywinker 98 days ago

The shoe is what the shoe does.

Except one shoe is made by children in a fire-trap sweatshop with no breaks, and the other was made by a well paid adult in good working conditions.

The ends don’t justify the means. The process of making impacts the output in ways that are subtle and important, but even holding the output as a fixed thing - the process of making still matters, at least to the people making it.

raw_anon_1111 98 days ago

The end is whether the code meets the functional and non functional requirements.

And guess how much shoe companies make who manufacture shoes in sweatshop conditions versus the ones who make artisanal handcrafted shoes?

uoaei 98 days ago

Functional requirements are known knowns.

Out of bounds behavior is sometimes a known unknown, but in the era of generated code is exclusively unknown unknowns.

Good luck speccing out all the unanticipated side effects and undefined behaviors. Perhaps you can prompt the agent in a loop a bnumber of times but it's hard to believe that the brute-force throw-more-tokens-at-it approach has the same level of return as a more attentive audit by human eyeballs.

raw_anon_1111 98 days ago

Are you as a developer 100% able to trust that you didn’t miss anything? Your team if you are a team lead who delegates tasks to other developers? If you outsource non business things like Salesforce integrations etc do you know all of the code they wrote? Your library dependencies? Your infrastructure providers?

xpe 98 days ago

It seems like ^ and ^^ agree to me. Am I missing something?

kennywinker 98 days ago

Ah yes - we should all strive to maximize shareholder value - triangle shirtwaist be damnned.

Btw in my metaphor, we - the programmers - are the kids in the sweatshop.

raw_anon_1111 98 days ago

If you are a “programmer” you are going to be the kids in the sweatshop. On the enterprise dev side where most developers work, it’s been headed in that direction for at least a decade where it was easy enough to become a “good enough” generic full stack/mobile/web etc dev.

Even on the BigTech side being able to reverse a btree on the whiteboard and having on your resume that you were a mid level developer isn’t enough either anymore

If you look at the comp on that side, it’s also stagnated for decade. AI has just accelerated that trend.

While my job has been at various percentages to produce code for 30 years, it’s been well over a decade since I had to sell myself on “I codez real gud”. I sell myself as a “software engineer” who can go from ambiguous business and technical requirements, deal with politics, XYProblems, etc

pjmlp 98 days ago

What do you think programmers in offshoring consulting shops are? Sadly.

pjmlp 98 days ago

Yet the people voting with their wallets seem to go with cheaper option, regardless of what hides behind it.

Being shoes, offshoring, Webwidgets or AI generated code.

kennywinker 98 days ago

Sure. People go for the cheapest option that fits their requirements, mostly.

But we’re the shoemakers, not the consumers. It’s actually our job to preserve our own and our peers quality of life.

Cheapest good option possible doesn’t have to be the sweatshop - tho the shareholders of nike or zara would have you believe that - the labor movements of the 19th century proved that’s not the case.

pjmlp 97 days ago

It is our job to keep our job, or leave if we don't agree with management, assuming to be lucky when there is an option to walk out and start anew right on the other side of the street.

kennywinker 97 days ago

This is what is sometimes called a “crabs in a bucket” mentality. It’s how you go from a middle class weaver, to an impoverished sweatshop worker in a generation.

jrm4 98 days ago

I would be very comfortable with - re-run 100 times with different seeds. If the outcome is the same every time, you're reliably good to go.

SpaceNoodled 98 days ago

Even when it's wrong each time?

onion2k 98 days ago

If it's wrong then it's not provably correct (for any value of 'proof').

How you define your proof is up to you. It might be a simple test, or an exhaustive suite of tests, or a formal proof. It doesn't matter. If the output of the code is correct by your definition, then it doesn't matter what the underlying code actually is.