Hacker News new | ask | show | jobs
by simonw 55 days ago
> Why use someone's project when you can just have the robot write your own?

I've been thinking about this a bunch recently, and I've realized that the thing I value most in software now isn't robust tests or thorough documentation - an LLM can spit those out in a few minutes. It's usage. I want to use software which other people have used before me. I want them to have encountered the bugs and sharp edges and sanded them down.

8 comments

Depth of use over the lifetime of an app is a quality all its own that often not appreciated. A recurring pattern at $dayjob is that a new manager or director will join a business unit and declare an existing app as the worst terrible, no good, horrible app they've seen and they're going to fix that. A year and a half later the new app is finally delivered with 80% of the original functionality and a fresh set of bugs. The new dev team sees the surface functionality but misses a lot of the hard earned nuance the old system accrued over time. This is a pattern that existed long before LLMs.
Good read!
An LLM most definitely cannot spit out robust tests or thorough documentation. It can spit out some tests or some documentation, but they will not cover the user perspective or edge cases unless those are already documented somewhere. That's verified by both experience and just thinking about it for two seconds.

The sanding down you refer to is what generates those tests and documentation.

> but they will not cover the user perspective or edge cases unless those are already documented somewhere

Are you suggesting that LLM's can't test for people who use screen readers? Keyboard only users? Slow network requests?

You're acting like the issues an app faces are so bespoke to the actual app itself (and have absolutely no relation to existing problems in this space) that an LLM couldn't possibly cover it. And it's just patently wrong.

I'm not talking about keyboards or screen readers or any sort of input testing, I'm talking about how the software is used in practice.

If you disagree with that, I think the onus is on you to show me that an LLM could simulate the full context in which a user interfaces with software. That's a ridiculous claim.

Feel free to show literally any evidence for this claim.

I'm disagreeing with the saying it's impossible across the board, I'm not saying it's universally possible.

lol And you made the claim, not me. The proof is on YOU.

No, that's not how burden of proof works.

The status quo is that this capability does not exist. Whoever makes a claim contradicting the status quo has the burden of proof. I can't prove a negative.

And even with your logic, I did not make the original claim, it was made by simon.

Your statement now also makes little sense. For any nontrivial software project, the usage patterns and interactions with other systems are complex enough that the code itself does not contain enough context to understand how it is used, or what the invariants are.

There may be very simple codebases where an LLM can actually give you "thorough documentation" or "robust tests", but those are rare.

> There may be very simple codebases where an LLM can actually give you "thorough documentation" or "robust tests", but those are rare.

Its not rare. I've built 2 dozen line-of-business apps in it last handful of years that were glorified CRUD apps. Every environment I've been in has had a mix of the 2.

And even then, that's at odds with your absolute above. On top of being in a field that changes daily.

>Are you suggesting that LLM's can't test for people who use screen readers? Keyboard only users? Slow network requests?

I don't think it's feasible to fully simulate the full depth of actual usage, given that (especially in the case of screen readers and the like) there's a great deal of combinatorial depth and context to the problem. Which screen readers, on which operating systems, and which users thereof?

I can’t tell if you’re being sarcastic or not
You're saying that every app on this planet has bespoke usages that can't be derived from the app itself? That's your claim or am I getting this wrong?
> he thing I value most in software now isn't robust tests or thorough documentation - an LLM can spit those out in a few minutes.

Can it if we stop defining "robust tests" as "a lot of test code lines" and "good documentation" as "lengthy documentation"?

I chose my words carefully. "Robust tests" are tests that provide high coverage and aren't flaky. "Thorough documentation" likewise is documentation that describes as much of the code as possible.

I didn't use the word good.

Yep. I realised the same. No one reads docs, or goes through tests. Either ways it's easy to write useless tests. And easy to write useless docs. Idt most even read the code. Now the difference is that it has become possible to write useless code.

So it's just the fact that others have already gone through the motions before I did. That's it really. I suppose in commercial settings, this is even more true and perhaps extends to compliance.

> No one reads docs, or goes through tests.

I regularly do both when trying to use library, especially unfamiliar to me.

Dare I say you're in the minority
I hope not. How else are you learning to use the library? The only other option is to read the source, which is also a good idea eventually, if something is unclear, but why would you _start_ there?
Ask LLM.
Bad idea.

But even in that case, you're reading the documentation. Just through a nondeterministic, hallucinating search engine.

Maybe, but still a counterexample.
> No one reads docs

sooo uhh how do _you_ learn how to use a new library? just throw random shit at the wall until something sticks?

I feel similarly but IIUC I think that doesn’t strictly require an open source development model. I’ve benefited a huge amount from consuming and contributing to open source projects and I’m a bit worried that the “unit economics” changing might break some of the social dynamics upon which the ecosystem is built.
> an LLM can spit those out in a few minutes.

It may be able to spit out text that purports to be that, in a few minutes. But for most software, an LLM will not be able to spit out robust tests - let alone useful documentation. (And documentation which just replicates the parameter names and types is thorough...ly useless.)

That's why I said "thorough" and not "good".
So battle tested
I value software that reveals knowledge. The frontier LLMs were trained on all the code that institutions had been keeping to themselves. So they're revealing programing know-how on a scale that just wasn't possible with open source. LLMs are the ultimate Prometheus. Information is more accessible and useful now than it's ever been.
> The frontier LLMs were trained on all the code that institutions had been keeping to themselves.

Lolz! I haven’t encountered “code that institutions had been keeping to themselves” that got even remotely close to OSS in quality.

The quality of code inside Google's Google3 repository is more consistently high quality than most of what I see in the Exterior World.

But there's no way that Google is releasing a model trained on it. Way too high of a risk of IP leakage.

I promise you, "the code that institutions had been keeping to themselves" is not nearly as special or good as you are implying here.
True.

I have worked during several decades in many companies, located in many countries, in a few continents, from startups to some of the biggest companies in their fields. Therefore I have seen many proprietary programs.

On average, proprietary programs are not better than open-source programs, but usually worse, because they are reviewed by fewer people and because frequently the programmers who write them may be stressed by having to meet unrealistic timelines for the projects.

The proprietary programs have greater quantity, not quality, by being written by a greater number of programmers working full-time on them, while much work on open-source projects is done in spare time by people occupied with something else.

Many proprietary programs can do things which cannot be done by open-source programs, but only because of access to documentation that is kept secret in the hope of preventing competition.

While lawyers, and other people who do not understand how research and development is really done, put a lot of weight in the so-called "intellectual property" of a company, which they believe to be embodied in things like the source code of proprietary programs or the design files for some hardware, the reality is that I have nowhere seen anything of substantial value in this so-called IP. Everywhere, what was really valuable in the know-how of the company was not the final implementation that could be read in some source code, but the knowledge about the many other solutions that had been tried before and they worked worse or not at all. This knowledge was too frequently not written down in any documentation. Knowing which are the dead ends is a great productivity boost for an experienced team, because any recent graduate could list many alternative ways of solving a problem, but most of them would not be the right choice in certain specific circumstances.

> On average, proprietary programs are not better than open-source programs, but usually worse, because they are reviewed by fewer people and because frequently the programmers who write them may be stressed by having to meet unrealistic timelines for the projects.

There's also the fact that when you write open-source code, you're writing for a friendly audience. I've often found myself writing the code, letting it rest for a few hours, then rewriting it so that it is easier to read. Sometimes, the code gets substantially rewritten before I push.

There's no cooling period when you write code during your 9-5 job: it works, it has the required test coverage, ship it and move on to the next task.

The claim is also just categorically untrue. The largest source of training data by far is publicly available code on e.g. Github, so it mostly just gives you a way to recycle already-available code, without crediting the author, while allowing you to pretend you own it.
So you're both saying all the alpha in Claude comes from open source devs like me? Even when I'm wrong I'm right.