Hacker News new | ask | show | jobs
by alsetmusic 78 days ago
> Things like does the site load, does it behave the same, did anything major change on the happy path, etc etc.

I asked Claude to build a web app to run locally polling data from the LAN. It fought me for four rounds of me telling it that the data from the api wasn’t rendered on the page. It created tests with mock data, it validated the api, it tested that the page loaded. It was gaslighting telling me that everything worked every time I told it that it didn’t work. I had to tell it to inspect the dom and take screenshots with Playwright to make it stop effing around. I don’t think it ever would have found the right response on its own.

Even after deliberate intervention, it regressed a few rounds later and stopped caring that tests failed. Whatever, I don’t treat it as anything more than a sometimes-correct random output machine.

1 comments

The thing you're missing is harness engineering.