| Thanks! At first I was using OpenAI's deep research to just give a summary and overall score 1-10, but I realized that could not be iterative and future proof as new evidence comes to light. So after some thought, I switched to a system of individual evidence gathering and weighting each piece of evidence. I've given the models some basic starting points for types of evidence (for instance a donation has a default weight of 8/10), but have given the models leeway to make relative judgements. After all evidence is collected, the weights and confidence that the evidence is accurate (usually very high) are put into a formula to derive a final score. No recency bias. The nitty gritty: -Each row contributes direction × weight × confidence × status_factor, where disputed is cut in half and there is no recency decay. -All signed contributions are summed into S, and total support mass goes into M.
Final score is 50 + 50 * (S / (M + 4)), clamped to 0-100. -That +4 prior mass keeps thin but unanimous evidence from producing extreme scores too easily. -Neutral evidence (direction = 0) doesn’t push the score up or down, but it does increase M, which pulls the result back toward 50. As for the ladder - I think that is a good idea, but in a controlled manner because of the token cost and potential for abuse. |