| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by not2b 65 days ago
	If the result is statistically significant, it just barely makes it. 84.8% isn't that much higher than 80.8% and they had only 250 prompts, if I'm reading this right.

1 comments

tgv 65 days ago

In a field where progress is measured in tenths of percent points, that's not true. Think of it this way: the error rate drops from 19% to 15%, or from 1 in 5 to 1 in 6.

link

danparsonson 64 days ago

Statistical significance is about whether an effect can reliably be said to have been measured at all; it's not about whether or not the effect itself would be significant in the sense of moving some other needle.

The ~5% improvement reported here might just be an artefact of the data collection or random variation, rather than a consistent repeatable change.

link

tgv 64 days ago

I know what significance means, and I also know that getting it from a p-value is nonsensical.

> The ~5% improvement reported here might just be an artefact of the data collection or random variation, rather than a consistent repeatable change.

You're questioning method or data representativeness, not significance. 250 samples is just about enough to for a 5% difference in NHST (stddev is around .4, so 1.64 sigma is .4/15.8*1.64=0.04 for single sided testing).

link

not2b 64 days ago

Yes, it looks just barely significant. Results that are on the edge like that often aren't reproducible.

link