Hacker News new | ask | show | jobs
by slrainka 408 days ago
Agent mode without rails is like a boat without a rudder.

What worked for me was coming up with an extremely opinionated way to develop an application and then generating instructions (mini milestones) by combining it with the requirements.

These instructions end up being very explicit in the sequence of things it should do (write the tests first), how the code should be written and where to place it etc. So the output ended up being very similar regardless of the coding agent being used.

1 comments

I've tried every variation of this very thing. Even managed to build a quick and dirty ticketing system that I could assign to the LLM of my choosing. WITH context. Talking Graph Codebase's diagrams, mappings, tree structure of every possibility, simple documentation, complex documentation, a bunch of OSS to do this very thing automatically etcetcetc.

In the codebase I've tried modularity via monorepo, or faux microservices with local apis, monoliths filled with hooks and all the other centralized tricks in the book. Down to the very very simple. Whatever I could do to bring down the context window needed.

Eventually.....your return diminish. And any time you saved is gone.

And by the time you've burned up a context window and you're ready to get out. Now you're expeciting it to output a concise artifact to carry you to the next chat so you don't have to spend more context getting that thread up to speed.

Inevitably the context window and the LLMs eagerness to touch shit that it's not supposed (the likelihood of which increases with context) always gets in the way.

Anything with any kind of complexity ends up in a game of too much bloat or the LLM removing pieces that kill other pieces that it wasn't aware about.

/VENT

So, relying on a large context can be tricky. Instead I’ve tried to get to a ER model quickly. And from there build modules that don’t have tight dependencies.

Using Gemini 2.5 for generating instructions

This is the guide I use

https://github.com/bluedevilx/ai-driven-development/blob/mai...

How many tokens (across whole codebase) did it take for diminishing returns to kick in? What does the productivity vs token plot look like?