Hacker News new | ask | show | jobs
by itissid 54 days ago
So RPI/QRSPI like skills (e.g. https://github.com/mattpocock/skills and https://github.com/humanlayer/humanlayer/tree/main/.claude/c... and https://github.com/dfrysinger/qrspi-plus ) for working with claude code work well enough for me that they can reliably* produce code that matches the plan/spec in a way they did not till December 2025.

I have a gut feeling that these models can do just as well, has someone run a reasonable size task — >=1-2 days of designing and planning — and see it work well with these models?

* For me what worked well was the grill me skill(or its variation) at the design stage, the hygiene I followed here was have it ask one question at a time, resolving dependencies at the design stage and reading the hashed out plan closely. The use of a couple of other MCP tools like a documentation server like deepwiki and arxiv for grounding. Other tricks I use are having high signal tests and having claude either be able to read logs and code at the same time or embedding it in the execution(e.g. as a debugger, repl or devtools)

1 comments

are you talking about a single prompt that runs for 24 hours or 8 hours of developer time spent in a single session?
No duplicate the whole task e.g. I use grill-me skill for planning and it takes me ~3 hours and CC asks me 20-40 questions. Do the same grill-me with this and compare the outcomes. I admit Its quite a lot of work to duplicate, but i am really itching to do this over a few tasks and compare the final plan. Just need the time.