| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hntrader 1940 days ago
	Are you presenting real samples and GPT2 samples to users with equal probabilities? EDIT another poster guessed GPT2 each time and found the frequency was 80 percent

2 comments

moyix 1940 days ago

It should be equal: there are 1000 real and 1000 generated samples in the database, retrieved via:

SELECT id, code, real FROM code ORDER BY random() LIMIT 1

link

lelandbatey 1940 days ago

I guessed GPT2 each time, 200 times in a row and only found that GPT2 was correct 89/200 times, so about 45% was GPT2 for me.

link

wnoise 1940 days ago

In [2]: scipy.stats.binom_test(89, 200, 0.5) Out[2]: 0.13736665086863936

Unusual to be this lopsided (1-in-7), but not crazy.

link

zaik 1940 days ago

Reminder that the p-value of a test is NOT the probability of H0 being true, see [0]. It only shows that, if we assume a significance of 0.05 we cannot reject the hypothesis (in our case that 89/200 is the result of a binomial distribution with p=.5).

[0] https://en.wikipedia.org/wiki/Misuse_of_p-values#Clarificati...

link

wnoise 1940 days ago

Yes. That is what I said and how I interpreted it. If the split is even (H0 true), getting a result that lopsided is a 1-out-7 deal.

It's rare that a p-value is what you want, but for answering "how unusual is this case", it's the exact right tool for the job.

link

hntrader 1939 days ago

It only shows the probability of observing either that statistic, or something even more extreme, under the null distribution. The implication that we can then "reject the null hypothesis" is more parlance and heuristic than anything.

link