| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mikaelaast 135 days ago
	How do you verify the code without actually looking at it?

3 comments

adamzwasserman 135 days ago

Although I write very little code myself anymore, I don't trust AI code at all. My default assumption: every line is the most mid possible implementation, every important architecture constraint violated wantonly. Your typical junior programmer.

So I run specialized compliance agents regularly. I watch the AI code and interrupt frequently to put it back on track. I occasionally write snippets as few-shot examples. Verification without reading every line, but not "vibe checking" either.

link

mikaelaast 135 days ago

I like this. The few-shot example snippet method is something I’d like to incorporate in my workflow, to better align generated code with my preferences.

link

adamzwasserman 135 days ago

I have written a research paper on another interesting prompting technique that I call axiomatic prompting. On objectively measurable tasks, when an AI scores below 70%, including clear axioms in the prompt systematically increases success.

In coding this would convert to: when trying to impose a pattern or architecture that is different enough from the "mid" programming approach that the AI is compelled to use, including axioms about the approach (in a IF this THEN than style, as opposed to few shot examples) will improve success.

The key is the 70% threshold: if the model already has enough training data, axioms hurt. If the model is underperforming because the training set did -not- have enough examples (for example hyperscript), axioms helps.

link

moomoo11 134 days ago

"Let's check that we can do X, Y, Z"

"Create documentation and then write tests"

a few moments later...

"There's a bug where we cannot do Y. Investigate the code and then let's discuss the best fix"

"Update the documentation and tests"

link

raw_anon_1111 135 days ago

How do you verify the compiler without looking at the assembled code? How do you verify code that links against binary libraries?

You run it and check for your desired behavior.

link

giantg2 135 days ago

Compilers have a finite set of inputs and outputs that should generate reproducible results. There's a larger amount of possible outputs for the same question with AI and very little reproducbility.

link

raw_anon_1111 135 days ago

Yes but once the code is written it’s not going to magically change. I am going to test the code just like I would test something I wrote - again like I’ve been doing for 40 years when writing my code by hand.

link

giantg2 134 days ago

But your thought process during coding influences your testing. At least for most of us, we find edge cases or point of concern during coding that we place extra focus on in test.

This is different than what you've done for the past 40 years becuase you're not testing your code. This would be analogous to you testing someone else's code. The vast majority of people and places have not followed that paradigm until AI showed up.

link

raw_anon_1111 134 days ago

My thought process during my architecture influences my testing.

Since AI has been a thing, I’ve been in a customer facing cloud consulting role - working full time at consulting departments (AWS ProServe) and now a third party company - specializing in app dev.

Before my hands actually write a line of code or infrastructure as code, I’ve already spoken to sales to get a high level idea of what the customer wants, read over the contract (SoW) to see what questions I have, done discovery sessions/requirements analysis, created architecture diagrams, done a design review, created detailed stories/workstreams (epics), thought about all the way things can go wrong etc.

I very much keep my hands on the wheel and treat AI as a junior coder that might not follow my instructions. I can answer any question about architectural decisions, repo structure, what any Lambda does the naming conventions etc.

I’ve also intuited “these are the things that I need to think about and test for from my 30 years of professional experience as a developer and 8 years of experience across literally dozens of AWS implementations”.

In the before times, if I were doing this without AI, I would have to have two or three more junior people doing the work just because I couldn’t physically do it in 40 hours a week. Even then I would be focused on how it works and look for corner cases.

I don’t have to think about what I need to test for. I did specifically call out concurrency because there are subtle bugs.

Ironically, what I am working on now had a subtle concurrent locking bug that Codex wrote. I threw the code into ChatGPT thinking mode and it found it immediately and suggested better alternatives. I also have Claude and Codex cross check each other.

link

giantg2 134 days ago

"I don’t have to think about what I need to test for."

Good luck then. The business process flow including edge cases should arguably be top of mind for what to test. Testing shouldn't be an afterthought but rather an integral thought when writing the code that needs to be tested.

"I would have to have two or three more junior people doing the work"

Yeah, and they're the ones thinking about testing the code they write. Architects (which it sounds like you are an architect and not a dev) don't get into thay much detail.

link

mikaelaast 135 days ago

(Those are hardly analogous comparisons to LLM generated code, are they?)

So you do a vibe check?

link

raw_anon_1111 135 days ago

What’s “vibe checking”?

I input x and I expect y behavior and check for corner cases - just like I have checked for correctness for 40 years. Why do I care how the code was generated as long as it has the correct behavior?

Of course multithreaded code is the exception unless the LLM is putting a bunch of rnd() calls in the code to make it behave differently.

link