| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by numpad0 34 days ago

That[1] used a hand picked set of ambiguous images and still got 60% overall accuracy across 11k participants. I don't know much about statistics[2], but 1) 60% HAS to be statistically significant, and it was 2) under ADVERSARIAL, not neutral, condition. So people can tell.

Anyways, that's besides my point. The point of mine is that, it always turn into all-caps flamewars like this, with no middle ground or third camps, and that this has to be more of a phenomenon than regular disagreements. This isn't bikeshedding. This is Spanish bullfighting centered around a piece of red cloth.

1: https://news.ycombinator.com/item?id=42216694

2: I just asked Gemini "is 60% accuracy over 11k participant for a test statistically significant and why", it said "yes, it is overwhelmingly statistically significant" and "completely off the charts". They said p<0.05 figure would be 50.94%.

1 comments

tskj 34 days ago

Hmm yes, I had it backwards. I agree this is very statistically significant, but the effect size is tiny. Up from random chance to a mere 60%, means that Scott proved with high statistical significant that people reliably cannot tell.

Also I'm not that worried about the adversarial conditions, any real life conditions are likely adversarial in the relevant sense. No one is one-shotting generation and serving you that, obviously output has been selected for quality. I would call Scott's test not adversarial, but fair.