Hacker News new | ask | show | jobs
by latexr 111 days ago
> All verdicts are LLM-scored, not human-verified.

In other words, could be all slop. Or maybe it’s not. Maybe it’s mixed. No one knows.

2 comments

Fair critique. The methodology doc covers this: both pipelines agree on the high-confidence clusters (security vulnerabilities, bubble predictions) even though they disagree on edge cases. The repo is public specifically so people can spot-check. If you find a claim where the scoring is wrong, I'd genuinely like to know.
> If you find a claim where the scoring is wrong, I'd genuinely like to know.

So you’re asking me to do the work you should have done in the first place? If you didn’t put any effort into it, why should I waste my time checking your non-work and correcting it to your credit?

If you had actually put in the effort then sure, I’d be amenable to helping making this the best it can be. But you didn’t, so what’s the point? Why should anyone spend their time fixing other people’s slop?

I am curious whether claims are scored more accurately by LLMs when reviewed and edited by LLMs prior to posting the claim.