Hacker News new | ask | show | jobs
by alabut 24 days ago
Simon Willison made a similar parallel recently:

https://simonwillison.net/2026/May/6/vibe-coding-and-agentic...

  “The thing that really helps me is thinking back to when I’ve worked at larger organizations where I’ve been an engineering manager. Other teams are building software that my team depends on.

  If another team hands over something and says, “hey, this is the image resize service, here’s how to use it to resize your images”... I’m not going to go and read every line of code that they wrote.

  I’m going to look at their documentation and I’m going to use it to resize some images. And then I’m going to start shipping my own features. And if I start running into problems where the image resizer thing appears to have bugs or the performance isn’t good, that’s when I might dig into their Git repositories and see what’s going on. But for the most part I treat that as a semi-black box that I don’t look at until I need to.”
4 comments

Suppose the image resize service has some caching, and due to a bug in the caching, under certain circumstances it will respond with an already-cached resized version of a different source image.

Let's say for example it caches on something stupid like the CRC32 of the input image -- good enough that the couple dozen images in your test dataset don't collide, you don't see it in smoke testing your app, but real world data has collisions on a daily basis.

This gets into production and customer A sees a resized version of customer B's document for a thumbnail. Now customer A is wondering how many other customers are seeing resized versions of their private documents in thumbnail images. They are very very mad.

If the image resize service was built by "another team" then that other team is responsible for the bug and will take most of the heat for it. If it was built by an "agent swarm" or "gas town" or whatever under my direction then I'm 100% responsible for it and rightly deserve the heat.

That is why I cannot understand any approach that doesn't involve reading the code at all. Testing alone is not sufficient. MTTR is not sufficient because you can't make a customer less mad about a data privacy bug by fixing it.

Practically, this is just about confidence values, anticipated blast radius and balancing testing vs review overhead.
I found this sort of odd. What is your point? Is it good or bad that another team was responsible in one scenario?
Two points.

1. You can treat software like a black box when other people developed it for you because they can stand behind it. They have their own reputations to uphold. You can't when AI developed it for you because YOU are responsible for 100% of the bugs in it. If you take this trendy stance of "I never read or write code, just specs", you are just rolling the dice on what you stamp your name on.

2. Just because you have unit tests and you've tested the software by clicking through the app doesn't mean you've found every bug. There have always been bug types, like the example checksum collision, that are easier to detect by reading the code than by running the code because it will work most of the time even though the approach is wrong.

But I'm already responsible for the bugs in my software. Also, who cares if someone else is responsible? And how does that align with OSS's "no warranty provided"?

> There have always been bug types, like the example checksum collision, that are easier to detect by reading the code than by running the code

AI seems radically, insanely more qualified to not write bugs like that. I doubt that if you polled developers 99% would be able to tell you what a CRC32 even is, let alone why it's insufficient as a cache key.

> But I'm already responsible for the bugs in my software. Also, who cares if someone else is responsible? And how does that align with OSS's "no warranty provided"?

The original example from Simon Willison referred not to pulling in a 3rd party library, but working "at larger organizations" where "another team hands over something". In other words we area all working on the same product for the same company, they have been assigned another part of it and I'm expected to use their code.

In that scenario of course I care that someone else is responsible! It may affect whether I get fired or not!

It's different if you're a solo founder of a startup and for everything you ship, the buck stops with you. But proportionally many many more devs are in a situation where they are a cog in a machine.

> AI seems radically, insanely more qualified to not write bugs like that. I doubt that if you polled developers 99% would be able to tell you what a CRC32 even is, let alone why it's insufficient as a cache key.

I actually do agree that AI generally writes pretty good code. Doesn't mean I'm not gonna check. Sometimes it is too clever for its own good, such as re-implementing from scratch something that already exists and is well-proven.

The whole example is kind of contrived in the first place (how many environments don't have an excellent "image resizing" solution to reach for off the shelf?), so I hope you don't mind my bug example is also contrived.

But then, the ownership is clear. And no team would be like to be pointed that their 5th iteration is also broken and can’t be relied for production usage. That’s the difference with AI code. LLM are not aligned with your goals. Any trust in them doing the right thing is very misguided.
That's why you have them write tons of tests. Way more than you generally would for human written code. And the agent writing/maintaining the tests is not the agent fixing the bugs.

I've personally had a LLM write an image resizing library for me. It's a fairly basic one, I didn't need anything fancy. I could have used something off the shelf but it was at a time when I was testing what Claude could do. And to be honest, it just worked. One shot, if I recall correctly, or at least, one session with a few tweaks and never touched again. It's been embedded in a larger app for several months and I don't recall hitting a single bug with that, specifically. So I'm not sure your complaints about "the 5th iteration" being broken have much grounds here.

> It's a fairly basic one, I didn't need anything fancy.

> one session with a few tweaks and never touched again

> and I don't recall hitting a single bug with that, specifically.

And there you got your answer. If every scenario was as simple as that, we wouldn't really need software development teams. I'm not saying that you can't good result with an LLM tool, but most software are in constant flux and software engineering is about keeping the cost of making new changes minimal.

So if you have a dependency, you want to treat it as a black box, because it lowers the cognitive load. But you don't want it to suddenly change its contract, including breaking it in some strange way. And that brings me to...

> That's why you have them write tons of tests.

Tests are not implementation guarantee. They are a canary to warn about some errors. You assume the code is going to written in good faith, but you place alert points to warn you about possible mistakes. Because you can't really test the full implementation without having a brittle test suite (which you have to maintain).

And tests relies on a lot of assumptions (mocks, initial cases, fakes,...). Those should be treated with care. Because as soon as one are wrong, the test cases it affects are make-believe.

The only true testing of your software is done in production. Everything else is about avoiding the easy mistakes.

Simon Willison’s analogy does not apply unless that other team was immediately fired after they delivered the image resize service, or (more commonly) was done by a one off contractor. The difference is the trust model. We trust that our company has hired a competent team which maintains knowledge of the image resizing service, that they respond to bug reports and feature requests and that they know how to fix and implement those.

Now I have been on HN long enough to know that we used to despise code written by contractors which we now depend on.

Why does the team need to be "fired"?

The single person who did the service might just quit and go to another job. They might be external consultants that rotate away when the contract ends. It might be a SaaS service where you don't control the code at all - nor the composition of their team.

We have trusted services, contractors and teams within our companies before. Now suddenly _everyone_ has ALWAYS read and meticulously analyzed every single line of code they have ever imported to a project?

As your parent comment says. It’s about trust. People don’t hire contractors with low reputations. Same with SaaS services. That’s why you see so much stuff about branding and customer testimonials. It can be gamed, but usually works well enough.

LLM have no reputation to lose. Their work may or may not be aligned with your goals and they can’t care if they messed up.

Personally, if my company would have one person write a utility which mine would depend on, and that person would quit soon after delivery, I would be pissed. And I would demand that my team take ownership of the utility, and gain intimate knowledge of the utility, and voice my concerns with management who made the decision to hand out a task like that to a single person. I would then inform that management about the concept of bus factor, and how they just violated best practices. That next time they decide to hand out a task like that to a single person, that they should instead just hand it out to the team which is gonna rely on that utility.
This leaves out the part where you ask the original developer: "Why does this thing do that?"