Hacker News new | ask | show | jobs
by monkeydust 4 days ago
I have a MA system setup for personal use.

You give it a problem, you then refine that problem where a fast, cheaper model asks you questions which you answer to get a better input prompt. You then choose a MA strategy for example take problem break up to sections then final judge concludes or you do multi turn where agents debate then judge summarises debate.

The best approach is what I call 'all angles' where all these strategies run in parallel the final meta-judge synthesise the response - the most useful part of this which I recently added is a view to see the variance in each strategy.

Been using this for life stuff - housing search, schools, family challenges!

Perhaps I should make a video of it in action if people in HN community interested let me know.

7 comments

Right here is the video demo of what I built - https://streamable.com/e49cgt
Details and repo post on ShowHN here - https://github.com/monkeydust/rightmind
I have also developed a similar system not focused on the exploratory refinement of prompt(s). But more focused on feedback loops cybernetic style, so focused on the maintaining of stability of the prompt outputs by a growing library of deterministic checks and autofixes. Anything that is a "problem" which isn't covered by that library is surfaced to the human driving the process.
You mention cost in one of the replies. Can you elaborate on the cost profile (ballpark) for various problem types? I would also be curious to understand the strategies employed and what the costs look like across each.
Definitely interested, would love to see a video :)
Sure let me do that. Can I post this as a ShowHN if its just video? The rules say people need to try out but that will cost me a small fortune :) ...could perhaps post on Github and people can setup the repo themselves with their own Openrouter key if that works. Have never done a ShowHN but would be fun to try it.
The cheap models may ask subpar questions leading to subpar solutions
So what harness are you using? And what LLM’s
Homebrew harness and all frontier ones plus deepseek. All via Openrouter at the moment. Works well enough but can get expensive so use for real high value challenges. Interestingly the refine feature has been most useful to me and people I have shown, essentially people are lazy when expressing the initial problem (me included!), refine asks relevant questions to initial problem then refines the initial statement, user can accept/reject/edit before submitting.
I came to a similar conclusion. I think the default options in many IDEs (Ask/Plan/Agent) are limited... 'Refine' feels like an improved 'Plan' in that it doesn't just jump right into building a list of tasks based on the initial prompt, because who knows what sort of flaws or deficiencies were present in the initial prompt! Can't always get everything right in the first try. XP

I don't think a specific harness is even necessary to get a boost from 'Refine'. Even a simple custom agent is portable enough... it's easy enough to take the existing 'Plan' agent definition present in VS Code and tweak it to be 'Refine' instead.

There is a 5 line skill I’ve been using for refinement called grill-me that works quite well
The problem with these kinds of systems (they have been well studied), is that that the overall output is ultimately anchored to the dumbest models used.

I.e. you cannot end up having a more intelligent output by using more dumber models (that is: dumber than the most intelligent model used).

It's generally always best to refine your prompt and send it (at most) to the two smartest frontier models possible. And then have the smartest model review the output from the second smartest.