Hacker News new | ask | show | jobs
by rcar1046 87 days ago
Thanks! At first I was using OpenAI's deep research to just give a summary and overall score 1-10, but I realized that could not be iterative and future proof as new evidence comes to light.

So after some thought, I switched to a system of individual evidence gathering and weighting each piece of evidence. I've given the models some basic starting points for types of evidence (for instance a donation has a default weight of 8/10), but have given the models leeway to make relative judgements.

After all evidence is collected, the weights and confidence that the evidence is accurate (usually very high) are put into a formula to derive a final score. No recency bias. The nitty gritty:

-Each row contributes direction × weight × confidence × status_factor, where disputed is cut in half and there is no recency decay.

-All signed contributions are summed into S, and total support mass goes into M. Final score is 50 + 50 * (S / (M + 4)), clamped to 0-100.

-That +4 prior mass keeps thin but unanimous evidence from producing extreme scores too easily.

-Neutral evidence (direction = 0) doesn’t push the score up or down, but it does increase M, which pulls the result back toward 50.

As for the ladder - I think that is a good idea, but in a controlled manner because of the token cost and potential for abuse.