Hacker News new | ask | show | jobs
by raw_anon_1111 135 days ago
How do you verify the compiler without looking at the assembled code? How do you verify code that links against binary libraries?

You run it and check for your desired behavior.

2 comments

Compilers have a finite set of inputs and outputs that should generate reproducible results. There's a larger amount of possible outputs for the same question with AI and very little reproducbility.
Yes but once the code is written it’s not going to magically change. I am going to test the code just like I would test something I wrote - again like I’ve been doing for 40 years when writing my code by hand.
But your thought process during coding influences your testing. At least for most of us, we find edge cases or point of concern during coding that we place extra focus on in test.

This is different than what you've done for the past 40 years becuase you're not testing your code. This would be analogous to you testing someone else's code. The vast majority of people and places have not followed that paradigm until AI showed up.

My thought process during my architecture influences my testing.

Since AI has been a thing, I’ve been in a customer facing cloud consulting role - working full time at consulting departments (AWS ProServe) and now a third party company - specializing in app dev.

Before my hands actually write a line of code or infrastructure as code, I’ve already spoken to sales to get a high level idea of what the customer wants, read over the contract (SoW) to see what questions I have, done discovery sessions/requirements analysis, created architecture diagrams, done a design review, created detailed stories/workstreams (epics), thought about all the way things can go wrong etc.

I very much keep my hands on the wheel and treat AI as a junior coder that might not follow my instructions. I can answer any question about architectural decisions, repo structure, what any Lambda does the naming conventions etc.

I’ve also intuited “these are the things that I need to think about and test for from my 30 years of professional experience as a developer and 8 years of experience across literally dozens of AWS implementations”.

In the before times, if I were doing this without AI, I would have to have two or three more junior people doing the work just because I couldn’t physically do it in 40 hours a week. Even then I would be focused on how it works and look for corner cases.

I don’t have to think about what I need to test for. I did specifically call out concurrency because there are subtle bugs.

Ironically, what I am working on now had a subtle concurrent locking bug that Codex wrote. I threw the code into ChatGPT thinking mode and it found it immediately and suggested better alternatives. I also have Claude and Codex cross check each other.

"I don’t have to think about what I need to test for."

Good luck then. The business process flow including edge cases should arguably be top of mind for what to test. Testing shouldn't be an afterthought but rather an integral thought when writing the code that needs to be tested.

"I would have to have two or three more junior people doing the work"

Yeah, and they're the ones thinking about testing the code they write. Architects (which it sounds like you are an architect and not a dev) don't get into thay much detail.

If I’m starting off from sales -> reading the contract -> discovery -> design -> project plan -> implementation -> implementation review -> handover, how am I not involved with the business case?

I would never trust a junior developer who is just an experienced ticket taker (and most don’t get their first job after 10 years of being hobbyist) to look in that level of detail. Honestly the code is the least important. What it does is. If I’m 50 years old and still just a “human LLM ticket taker”, I’ve done something horrible wrong in life.

By definition, this is the worse AI coding will ever be, anyone hoping to stay in this game long term by being able to “codez real gud” is going to be in for a rude awakening.

Enterprise development where most developers work was becoming a commodity in 2016 where it was easy to become “good enough” and comp still looks like it did on the high end a decade ago. Now it’s even harder to stand out from the crowd.

Now we are seeing that even BigTech jobs where “I can reverse a b tree on the whiteboard” developers are becoming a disposable commodity with all of the layoffs. There is a reason I’ve been moving up the stack and closer to “the business” over the last decade

(Those are hardly analogous comparisons to LLM generated code, are they?)

So you do a vibe check?

What’s “vibe checking”?

I input x and I expect y behavior and check for corner cases - just like I have checked for correctness for 40 years. Why do I care how the code was generated as long as it has the correct behavior?

Of course multithreaded code is the exception unless the LLM is putting a bunch of rnd() calls in the code to make it behave differently.