Hacker News new | ask | show | jobs
by danenania 814 days ago
I'm working on a somewhat similar project: https://github.com/plandex-ai/plandex

While the overall goal is to build arbitrarily large, complex features and projects that are too much for ChatGPT or IDE-based tools, another aspect that I've put a lot of focus on is how to handle mistakes and corrections when the model starts going off the rails. Changes are accumulated in a protected sandbox separate from your project files, a diff review TUI is included that allows for bad changes to be rejected, all actions are version-controlled so you can easily go backwards and try a different approach, and branches are also included for trying out multiple approaches.

I think nailing this developer-AI feedback loop is the key to getting authentic productivity gains. We shouldn't just ask how well a coding tool can pass benchmarks, but what the failure case looks like when things go wrong.

3 comments

How open are you to moving plandex cloud over to AGPL? I know, tough ask right out the gate! Think about that one for a bit.

How is your market testing going?

Do you have contracts with clients amenable to let you write case studies? Do you need help selling, designing, or fulfilling these kinds of pilot contacts?

What are your plans for docs a PR?

As a researcher, it's currently hard to situate plandex against existing research, or anticipate where a technical contribution is needed.

As a business owner, it's currently hard to visualize plandex's impact on a business workflow.

Are you open to producing a technical report? Detail plandex methodology, benchmark efficiency, ablation tests for key contributions, customer case studies, relevant research papers, and next steps/help needed.

What do you think?

If plandex is interested in being a fully open org, then I'd be interested in seeing it find its market footing and grow its technical capabilities. We need open source orgs like this!

It’s AGPL licensed already :)
Did I miss the plandex-cloud repo? It seems like it's proprietary at this time. I couldn't find the AWS design, billing system, user dashboards, and admin dashboards.

Can you point me to the missing code?

You need to make yourself a business analyst agent to provide the feedback! To make it real, perhaps a team of them with conflicting personalities.
I think we'll get there at some point, but one thing I've learned from this project is how difficult it is to stack AI interactions. Each little bit of AI-based logic that gets added tends to fail terribly at first. Only after a long period of intense testing and iteration does it become remotely usable. The more you are combining different kinds of tasks, the more difficult it gets.
Does it work with a large existing codebase?
Yes, at least up to the point of the context limit of the underlying model. If you needed to go beyond that, you would break the work up into separate "plans" (a plan is a set of tasks with an attached context and conversation).

The general workflow is to load some relevant context (could be a few files, an entire directory, a glob pattern, a URL, or piped in data), then send a prompt. Quick example:

  plandex new
  plandex load components/some-component.ts lib/api.ts package.json https://react.dev/reference/react/hooks
  plan tell "Update the component in components/some- 
  components.ts to load data from the 'fetchFooBars' 
  function in 'lib/api.ts' and then display it in a 
  datagrid. Use a suitable datagrid library."
From there the plan will start streaming. Existing files will be updated and new files created as needed.

One thing I like about it for large codebases compared to IDE-based tools I've tried is that it gives me precise control over context. A lot of tools try to index the whole codebase and it's pretty opaque--you never really know what the model is working with.