Hacker News new | ask | show | jobs
by brookst 1213 days ago
I’ve got a fun little side project that uses GPT. I tested gptzero against 10 of my projects’ writings and 10 of my own. It detected 6 out of 10 correctly in both cases (4 gpt-written bits were declared human, 4 human-written were declared gpt).

Which is better than 50% but not nearly good enough to base any kind of decision on.

1 comments

> which is better than 50%

Unrelated: p-value for getting 12 from 20 correct just by chance is ~0.4 that is there is not enough data for the conclusion "better" in this case.

Null hypothesis: 50%/50%, the result random, normal distribution:

  H0: p=1/2
  H1: p!=1/2 (two-tail) 


  import statistics
  
  p0 = 0.5  # proportion of successes according to null hypothesis
  n = 20   # sample size
  p_sample = 12/n  # 12 from 20 are correct
  
  sigma = (p0 * (1 - p0) / n)**.5  # std according to H0
  z_score = (p_sample - p0) / sigma  # test statistic
  p_value = 2*statistics.NormalDist().cdf(-abs(z_score))  # prob. two-tails
  # p-value -> 0.4