| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by EFLKumo 25 days ago

> just like we don’t read assembly, or bytecode, or transpiled JavaScript

This makes sense since certain higher-level code produces certain lower-level code, while LLM cannot. If the transpired JS code doesn't work we could just find out the bug in minifiers, etc. but one cannot figure out why LLM fails at one task, especially considering LLMs, even SOTA ones, could be strongly affected by even small prompt changes. Taking this into consideration, I don't think this is a sound reasoning why we don't need to review ai-generated code.

> The LLMs produce non-deterministic output and generate code much faster than we can read it, so we can’t seriously expect to effectively review, understand, and approve every diff anymore.

Exactly. However, this could also indicate a weaker review standard instead of just dropping review. We could also suggest an idea where devs mainly review code design or interfaces, leveraging one's *taste*, while leaving strict logic reasoning, validating and testing to other tools or approaches. It cannot pursuade me that the nature of LLM's code generation must lead to a complete cancel of the code review.

Anyway, I'm not opposing this article and its thought of shift in the future is really good.

1 comments

trimethylpurine 25 days ago

Couldn't we slowly add guardrails that eventually lead to code generation becoming more and more deterministic over time?

I'm seeing in my experience that Claude has become better with every version at producing uniformity in its code output. Especially where the architecture is clear and documented. And even more so in languages with built in uniformity (Go, HTMX, SQL) where there is intentionally only one or two ways of doing things. In such environments, the output is nearly deterministic.

link

EFLKumo 25 days ago

I once thought about this and found that n-shots makes greater influences on LLMs. In other words, in a repo with good code quality and architecture (which offers good n-shots) and on a task with clear instructions and goals, LLM's output seems reliable enough, which meets your opinion. And n-shots is always better than relying on instruction following, instruction following mentioned in the article ("specifications") as an approach facing LLM's productivity, so imo the idea you suggested is another probability against/comparing with the article as well.

link