| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by monkeydust 4 days ago

I have a MA system setup for personal use.

You give it a problem, you then refine that problem where a fast, cheaper model asks you questions which you answer to get a better input prompt. You then choose a MA strategy for example take problem break up to sections then final judge concludes or you do multi turn where agents debate then judge summarises debate.

The best approach is what I call 'all angles' where all these strategies run in parallel the final meta-judge synthesise the response - the most useful part of this which I recently added is a view to see the variance in each strategy.

Been using this for life stuff - housing search, schools, family challenges!

Perhaps I should make a video of it in action if people in HN community interested let me know.

7 comments

monkeydust 4 days ago

Right here is the video demo of what I built - https://streamable.com/e49cgt

link

monkeydust 3 days ago

Details and repo post on ShowHN here - https://github.com/monkeydust/rightmind

link

ethanwillis 4 days ago

I have also developed a similar system not focused on the exploratory refinement of prompt(s). But more focused on feedback loops cybernetic style, so focused on the maintaining of stability of the prompt outputs by a growing library of deterministic checks and autofixes. Anything that is a "problem" which isn't covered by that library is surfaced to the human driving the process.

link

chrisss395 4 days ago

You mention cost in one of the replies. Can you elaborate on the cost profile (ballpark) for various problem types? I would also be curious to understand the strategies employed and what the costs look like across each.

link

Folcon 4 days ago

Definitely interested, would love to see a video :)

link

monkeydust 4 days ago

Sure let me do that. Can I post this as a ShowHN if its just video? The rules say people need to try out but that will cost me a small fortune :) ...could perhaps post on Github and people can setup the repo themselves with their own Openrouter key if that works. Have never done a ShowHN but would be fun to try it.

link

whattheheckheck 4 days ago

The cheap models may ask subpar questions leading to subpar solutions

link

uxhacker 4 days ago

So what harness are you using? And what LLM’s

link

monkeydust 4 days ago

Homebrew harness and all frontier ones plus deepseek. All via Openrouter at the moment. Works well enough but can get expensive so use for real high value challenges. Interestingly the refine feature has been most useful to me and people I have shown, essentially people are lazy when expressing the initial problem (me included!), refine asks relevant questions to initial problem then refines the initial statement, user can accept/reject/edit before submitting.

link

Cherub0774 4 days ago

I came to a similar conclusion. I think the default options in many IDEs (Ask/Plan/Agent) are limited... 'Refine' feels like an improved 'Plan' in that it doesn't just jump right into building a list of tasks based on the initial prompt, because who knows what sort of flaws or deficiencies were present in the initial prompt! Can't always get everything right in the first try. XP

I don't think a specific harness is even necessary to get a boost from 'Refine'. Even a simple custom agent is portable enough... it's easy enough to take the existing 'Plan' agent definition present in VS Code and tweak it to be 'Refine' instead.

link

SOLAR_FIELDS 4 days ago

There is a 5 line skill I’ve been using for refinement called grill-me that works quite well

link

saberience 3 days ago

The problem with these kinds of systems (they have been well studied), is that that the overall output is ultimately anchored to the dumbest models used.

I.e. you cannot end up having a more intelligent output by using more dumber models (that is: dumber than the most intelligent model used).

It's generally always best to refine your prompt and send it (at most) to the two smartest frontier models possible. And then have the smartest model review the output from the second smartest.

link