Hacker News new | ask | show | jobs
by m132 17 days ago
The problem is, mastering accessibility, intuitiveness, compatibility, responsiveness, scalability, architecture, performance, and all those other less immediately visible, "forward-thinking" parts of UX/software development has always been difficult. Ultra high-level frameworks and now LLMs have, on the other hand, made it even easier to botch all of these and quickly roll out a half-baked MVP. The gap between "acceptable" and "decent" has thus been widening. As a protagonist of "decent", you have it increasingly harder competing against those pushing for "acceptable". And the push is understandable as well, it's MVPs that make money and details only "increase customer satisfaction" at best (and these days, who even cares about customers?).

The end result is more crunch and a sharp decline in software quality, maybe even job satisfaction in general. As an (unfortunately anecdotal) example, I started to find myself fixing up broken websites or removing elements that get in the way with dev tools and uBlock every once in a while, and have heard from other people on here that they have been doing the same (https://news.ycombinator.com/item?id=47042747). All to restore basic functionality of websites I go on. This was never required back in the day, Flash and early web browsers didn't even have the option to do this.

Another, less anecdotal example from a while ago: https://news.ycombinator.com/item?id=47390945

It gets worse when you realize that most of the money saved through these cuts only benefits the very top of the hierarchy.

6 comments

> LLMs have, on the other hand, made it even easier to botch all of these and quickly roll out a half-baked MVP

Compared to the status quo where people pretty much never consider these things, like accessibility, especially not for an MVP? How many people have never added written aria attribute? I would suspect 90%+ of people touching the frontend.

The difference with LLMs is that (1) they have a latent rigor for things that you weren't going to spend time caring about anyways and, more importantly, (2) you can encode these things into prompts (AGENTS.md) and processes so that they happen even when you weren't going to invest the time with or without AI. For a lot of people this only means collecting some generic "skills" they found online yet it's still much better than what they were going to do pre-AI.

That's why I think AI is saving software in some ways, not leading to worse software.

Or, asserting that AI will botch software might hold more weight with people who have already forgotten how dogshit software was pre-AI.

I can somewhat see your point, but it is generally accepted that a wrong ARIA is worse than none, and LLM-assisted codebases, at least these days, only stick together thanks to testing, the more decent ones heavily emphasize in-depth human code reviews.

If our hypothetical developer hasn't used any accessibility-related tags before, what chance is there that those parts of the website will receive adequate testing?

Testing is an even more powerful subject here since we barely do it.

Testing is so hard that we'll agree that, e.g., TDD is great (e.g. ensure your tests actually test something, ensure your code is testable from the start) yet we never do it. And when we do write tests, we are on the hook to be eternally vigilant that they are not stale, that they test something real, that they are not redundant. And they often turn into an append-only file that you resent.

Meanwhile, AI is happy to write tests, do red-green TDD cycles, refactor them, prune them, update them, justify and defend them. It will even incidentally write tests for the most aloof vibe-coder by accident because they didn't specify otherwise.

Overnight, I went from never testing most of my side projects (except for, say, maybe unit tests in more straightforward things like a parser) to now everything is tested end-to-end. Every time I make a new directional / architectural decision, the tests the AI writes also encode it at the test level to reenforce the decision.

It's strictly a better world for software because AI can write and maintain tests.

> LLM-assisted codebases, at least these days, only stick together thanks to testing

But tests also help humans and ensure human-written software is robust. We only don't test because they are so costly to write and maintain, and our software has always suffered for it. Or the tests become such an unmaintainable mess that our software is now worse because of it!

I've gotten a lot of value out of LLM tools but without extensive feedback and direction I've found even the newer version of Opus pretty bad at writing good tests. First drafts are full of tests with some of these characteristics:

- good test, wrong layer, would turn into a mess of "wait why'd tests over there blow up for changes over here"

- mostly-good test, subtle issue (yes, this is the status quo with most human-written tests, but the risk of not being careful is that you (or the agent you throw at crap) now is overconfident in your future changes)

- weird silly test like "assert that calling the six statements in this method in the right order do the right thing" without... actually calling the method itself... so don't protect against changes to the method?

For non-greenfield work I've recently been much more happy with their out of the box code change quality then their attempts at adding coverage for those changes!

I think this is an area that has remained hard because putting directives in CLAUDE.md or whatnot for tests is generally gonna be so generic to be useless, like "put tests in the right place" without more module-specific context. Whereas if I'm making a non-greenfield change, I'm thinking much more in my prompt about constraints on the code itself, and much less about the current shape/state/organization of the tests or what to direct it on.

Properly used it's great. Definitely improved my test coverage a lot.

But it's entirely still in the world of "people who'd care to write good code before will write good code faster; other people will just write mediocre code faster."

> Meanwhile, AI is happy to write tests, do red-green TDD cycles, refactor them, prune them, update them, justify and defend them. It will even incidentally write tests for the most aloof vibe-coder by accident because they didn't specify otherwise.

I read some AI generated tests and while it looks visually impressive, ultimately it wasn’t doing anything valuable? Why? because of all the mocks and scenarios that didn’t matter. And on top of that, tests are additional code to maintain.

These days, I don’t even bother with unit testing. They are a maintenance burden. I focus on integration test (whole modules) and if I have the time, on a harness to do e2e testing.

Many humans do the same thing. They write tests that mock so much of the actual code, the "test" is not testing much at all, except perhaps the developer's ability to basically turn the code inside out. It's often just a large volume of crap that has to be maintained (or eventually deleted.)

I agree integration tests are best, with some e2e testing for common scenarios.

I worked at a place that required unit tests for every new method or function. Arguments like "this other (integration) test already covers that. why do I have to add another test?" wouldn't fly. PR reviews would often degrade into arguments about testing and how all database access needed to be mocked...

> I read some AI generated tests and while it looks visually impressive, ultimately it wasn’t doing anything valuable

I just saw this comment yesterday about one of the tests from Bun’s rust rewrite: https://news.ycombinator.com/item?id=48314311 It reads in the raw source code and uses a regex to assert that “unsafe” is used.

> These days, I don’t even bother with unit testing. They are a maintenance burden

I’ve come to the same conclusion, but that’s only because I’m working on solo projects. I think they are probably worth it with multiple devs on the same project.

IMO with multiple devs the focus on integration (module-level) instead of unit (lower-level/helper-function-level) is even more important than on single-dev projects.
My instructions and steering is to force the LLM agent to focus on mocking only system boundaries and outside in unit testing. I've found these two make verbose but good tests that actually prove behavior pretty well.

But that was my testing strategy already, but writing one of those tests could take legitimately hours with how many columns and crazy rules our area has.

a11y testing is non-trivial. axe-core can automatically detect many types of issues. However, enough compliance (to avoid being sued) needs end-to-end testing and human judgement. e.g. keyboard traps, focus restoration, alt-text, etc.
0% if by testing you mean "somebody who uses a screen reader regularly was able to use the product successfully" because nobody seems to do that.
>asserting that AI will botch software might hold more weight with people who have already forgotten how dogshit software was pre-AI.

You're responding to an assertion with an assertion. It has been empirically proven that SOTA models can create more dogshit software than pre-AI software. It is also trivially known that the user is unable to predict when and how the AI will introduce dogshit into the software. We literally had a study posted on this forum claiming models give more accurate answers if you're mean to them. This is the shit we're dealing with. Stuff you couldn't make up in a dystopian Douglas Adams novel.

>you can encode these things into prompts

Is this satire? SOTA models randomly disobey rules in prompts all the time.

When a dev drops a production db I can warn them. If they do it multiple times during their employment I can change their roles or fire them.

I can count the number of companies providing SOTA models with the fingers on my hands. Imagine having an employee pool of only 5 savant coders with dementia to choose from to hire to your company. That's it. Thats the entire applicant pool. You can only fire one of them by hiring one of the other four to replace them with. And you can't really fire them for dropping production dbs if you can't prevent the other ones from making the same mistake. This is the current AI-first hellscape as it stands.

I would much rather have software that works but lacks accessibility features than software that's broken but also has some broken accessibility features sprinkled in. The former is useful to many people, while the latter is useful to no one.

But the key here is: LLMs don't have latent rigor, nor any other kind of rigor.

But software was already in a horrible state before AI, so your dichotomy doesn't work.

The status quo with pre-AI Human Written software on a pedestal is that it doesn't work, and it lacks accessibility, polish, performance considerations, UX considerations, tests, and more.

The built-in rigor is trivial to prove. Just put Opus 4.8 in plan mode and tell it to plan something, like a vt100 emulator.

The question isn't whether you can do better than AI, because you'll put your foot on the scale and give yourself infinite time, attention, and energy just so you can say yes. It's whether AI can do as good or better than you with the same time, attention, and energy you would have given a task in the first place.

In practice, you would use an already written implementation, maintained by somebody else. An option that is often ignored by LLM (copy-paste galore).

For example, imagine if textual-serve author would reimplement xterm.js What effect it would make on quality.

LLMs increase technical debt rapidly. It is unclear whether they can deal with the mess they create. But we'll know soon (no need to wait years, to get immovable mess).

The positive side of LLMs is that they confirm experimentally the usefulness of many software engineering practices (testing,docs, adrs, design, formal specs etc)

A couple months back, I had Sonnet build me a browser-based VT-100 terminal emulator. I then had it build me a websocket-to-telnet gateway for connecting some old retro systems to it. Both worked pretty well.

Could I have done it on my own? Yes, eventually. The problem is I would've lost interest and moved on to some other useless project before getting that far.

> software was already in a horrible state before AI, so your dichotomy doesn't work.

It depends on the software. But, generally speaking, I try to use and write the best software available that solves my problem, even if it's one of a kind; it doesn't really matter if the other 99.9999% of the software in the space is broken.

Given 1000 hours to work on a problem, an LLM will continue to yeet out mediocre variations on a theme. Give me 1000 hours to work, and my product will keep getting 1% or 2% better until it's much better than any shit an LLM would spit out.

Similarly, I would much rather use someone else's emulator that they spent 1000 hours on than have AI yeet out some mediocre shit that kinda works, but is really just a mindless exploitation of something that someone else wrote that was actually good.

Then, you follow that with, "Yeah, but AI just allows you to iterate faster and skip the boring stuff, so that you make your product better even faster."

And then I follow that with, "The part where you take it apart piece by piece and study each piece and get kicked in the head by the realities of your lack of understanding is the part that's actually valuable, and it's precisely what you're skipping with an LLM"

I think higher up in the thread it was already argued and agreed that the rigor is generally already present, and can be doubly sure to exist in agents.md.

You get a lot of things for free with LLMs. You program an input box that goes to a form, handle the backend, make sure the variable names match the box, the API, the data structures and the database columns. All the brain matter making sure that flow works is mastered by an LLM since even with GPT 3.5 (not that I am advocating using that anymore). But the point is, since that's taken care of, why not also let it handle aria, input hints, mouse effects, state storage, animations since most of it's handled by putting "handle aria, input hints, mouse effects, state storage, animations" any time the front end files are changed." in agents.md

I would say that's rigor. Now you go a step further and have another LLM in your QA cycle that double checks it.

With the right harness/agent/skill setup and requirements work, this (any specificity) can become part of the workflow from early stages.

Setting that up and making sure it works, at the early stages, is not what LLMs do best, today. This sort of work is possibly incredibly valuable in the next 12-36 months, depending on what LLMs can be designed to do out of the box.

An Agent that can deliver the correct legal review process for a patent and is correct often enough to validate the savings vs real humans, or at least the speed increase + manual review ala code reviews, is incredibly valuable.

Are we there yet? Maybe this is where current groups like FNCR are trying to go.

> That's why I think AI is saving software in some ways, not leading to worse software.

So far the way AI is making software better is mostly checking for common bugs and acting as review/pair programming buddy, not the code.

The code is bad. At best okay. But you can make a lot of it. And it's not bad enough to not be good enough.

I don't have the same rosy memory of the history as you do. I remember very regularly having to wrestle with flash or java applets reinstalling and doing all kinds of voodoo I could to get sites to run, sites I used fine the day prior. Figuring it all out on my oh so slow connection and hoping no one calls the landline lest it stops all the progress. Not saying it wasn't a fun period of chaos and I everyday wish for internet to become the wild west it once was. It was acceptable in the sense that I didn't know it could be better, I was used to OS hanging and crashing regularly on my oh-so-fancy dual core system as well regularly so it was all acceptable.

Today I find myself having less patience for it all, because things just tend to work a lot more often than they don't.

With AI, I find so many small businesses and others I deal with which have their own websites have started to have better design, better performance and while I don't know if they are really accessible but they tend to have more prevalence of aria labels.

Not to take away from your experiences, just sharing my anecdotes.

Personally the biggest conceptual problem with HTML/CSS is that it's incredibly unintuitive with mapping to real-world concepts, and not low-level (or straightforward) enough so that the low-level primitives are easy to manipulate.

For example, WPF programs can have great accessibility - this comes from the fact that programmers can manipulate primitives which are semantically meaningful (buttons, lists etc).

In HTML its just a sea of divs which lately have become an implementation detail to hang classes on to, and sometimes not even exposed directly to the developer, if you use a SPA framework.

There's not a snowballs chance in hell you can build a quality accesibility solution on top of that.

HTML's main idea was that HTML was a document model, that described the content in a semantically meaningful way, and CSS was the 'rest' which kind of allowed to make the whole thing pretty - the idea going as far as separating content from looks should allow people to 'theme' their websites and completely redefine layout.

That idea has been completely abadoned. I'm not a frontend dev - although I can make a competent webpage by myself, I don't have deep insights into what's going on in the field - maybe Web Components will allow people to create semantically meaningful pages in a standard way?

Maybe there should be another layer of abstraction on top of what we have - which is truly semantic, this time for real?

The sea of divs and the abandonment of HTMLs is a result of the poorly built frameworks. Actual HTML and CSS has not abandoned those core ideas, and the frameworks often reimplement native features badly (e.g. the shadcdn radio button example). HTML crafted with care is not a sea of divs, it is markup one can read that logically and cleanly describes what you see.
You're describing the state that software was already in prior to AI. Usability, accessibility, intuitiveness have all gone down the toilet in the blind drive to make everything "sleek" and "modern" by removing all text, reducing programs to hieroglyphs floating in a sea of padding. Removing features so somebody can say they changed something. We've refused the notion that software can ever be finished and now have to find some way to justify endlessly updating software forever.

AI might make the problem worse, it might not. But the problems you outline and the sharply declining computer literacy they bring are already here, driven 100% by human developers.

> As a protagonist of "decent", you have it increasingly harder competing against those pushing for "acceptable".

Some people go on a bicycle because they can't afford a car. Should car makers see those people as a problem?

> The end result is more crunch and a sharp decline in software quality

If you have 10 people eating steak and 10 people starving, then giving rice to the starving people would also sharply decrease dinner quality.

AI software is not the difference between good or bad, it's the difference between something or nothing.

> Some people go on a bicycle because they can't afford a car. Should car makers see those people as a problem?

Contrary to what you seem to believe, cars and bicycles are different kinds of things, not two versions of the same fundamental type, so this rhetorical question doesn't make much sense (consider that also your legs provide the function of transportation but are nevertheless not a kind of car).

> Contrary to what you seem to believe

Please stop doing this. That is not even close to what he said.

"Some people" do that? Not a controversial statement, and pretending otherwise is to argue in bad faith.

I'm afraid that there's never been a shortage of poor quality software
This is why the 'craft' should be left to open source for most commercial software. The business reality just doesn't care for it.

Only when you have a PR problem does the business switch back to signalling quality, like Microsoft, although it remains to be seen if they still have the quality part. Most of the craftspeople get to say 'told you so' but also it looks like a sinking ship to them. Once the PR problem is gone, it's back to shipping at the expense of quality.

This cycle conflicts with the idea of a craft, which is that you should do it that way all/most of the time. The business will stop caring about quality long enough that your skills will erode, making it a bad mix. Trying to practice a craft where you aren't in control of this cycle is corrosive to the spirit.