| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by K0balt 61 days ago
	In my experience sonnet<opus by a long shot for code review. Sonnet often flags things as errors that are not, because it fails to grasp the big picture… and also fails to grasp structural issues that are perfectly coded and only show up as problems at the meta scale. I have no reason to believe that the next generation won’t offer similar gains in verification, and there is some evidence to support that the cybersecurity implications are the result of exactly this expansion of ability.

1 comments

thepasch 61 days ago

It depends on how you review. In an orchestrated per-task review workflow with clearly defined acceptance criteria and implementation requirements, using anything other than Sonnet (handed those criteria and requirements) hasn’t really led to much improvement, but it drives up usage and takes longer. I even tried Haiku, but, yeah, Haiku is just not viable for review, even tightly scoped, lol.

Siccing Sonnet on a codebase or PR without guidance does indeed lead to worse results than using Opus, though.

link

K0balt 61 days ago

That makes sense, if your scope is tight enough, good enough is good enough. I’ve got the expected specifications and code style guides, including some aerospace engineering ones, but in complex systems I still run into difficult to sus out corner cases where the code works but the system breaks, usually due to unresolved conflicts in operational requirements.

link

thepasch 59 days ago

There’s definitely a ceiling for what LLMs are capable of, and I think aerospace engineering might just currently be it, haha.

link

K0balt 57 days ago

Lol yeah, I don’t think I’m ready to ride in the jet that Claude built lol. I should clarify that I use the code guidelines because they are solid guardrails for making things that perform predictably, not because I’m building MCAS lol. Let’s hope that “vibe aerospace engineering” is a way off for now.

link