Hacker News new | ask | show | jobs
by CuriouslyC 295 days ago
MCP up Playwright, have a detailed spec, and tell claude to generate a detailed test plan for every story in the spec, then keep iterating on a test -> fix -> ... loop until every single component has been fully tested. If you get claude to write all the components (usually by subfolder) out to todos, there's a good chance it'll go >1 hour before it tries to stop, and if you have an anti-stopping hook it can go quite a bit longer.
2 comments

Youve got to be doing the most unoriginal work on the planet if this doesnt produce a bowl of disfunctional spaghetti
Every sentence you will ever write in your entire life will be made from a finite set of letters. The magic is in how you arrange them.

If you have a really detailed, well thought out spec, you do TDD and you have regular code review and refactor loops, agentic coding stays manageable.

It takes an incredibly detailed spec to get an LLM to not go completely off the rails and even then. The amount of time writing that spec can take more time than just doing it by hand.

There is way too much babysitting with these things.

I’m sure somehow somebody makes it work but I’m incredibly skeptical that you can let an LLM run unsupervised and only review its output as a PR.

  > The amount of time writing that spec can take more time than just doing it by hand.
one thing about doing it by hand is you also notice holes/deficiencies in the spec and can go back and update it, make the product better, but just throwing it to an llm 'til its perfect-to-spec probably means its just going to be average quality at best...

tho tbh most software isn't really 'stunning' imo so maybe thats fine as far as most businesses are concerned... (sad face)

Can you elaborate on what you mean by anti stopping hook? Sometime I take breaks, go on walks, etc and it would be cool of Claude tried different things and even branches etc that I could review when back.
Basically, all LLMs are "lazy" to some degree and are looking for ways to terminate responses early to conform to their training distribution. As a result, sometimes an agent will want to stop and phone home even if you have multiple rows of all caps saying DO NOT STOP UNTIL YOUR ENTIRE TODO LIST IS COMPLETE (seriously). Claude code has a hook for when the main agent and subagents try to stop, and you can reject their stop attempt with a message. They can still override that message and stop but the change of turn and the fresh "DO NOT STOP ..." that's at the front of context seem to keep it revving for a long time.