Hacker News new | ask | show | jobs
by gerbilly 1795 days ago
> N=36

This is just a cheap shot.

I'd be more impressed if you demonstrated why the sample size lacks the power to demonstrate an effect. (Use math and show your work.)

Also, it's a common mistake to assume that if the sample size were 3 million instead that the study would be more 'valid.'

1 comments

> I'd be more impressed if you demonstrated why the sample size lacks the power to demonstrate an effect.

Unless and until someone in this thread gets a copy of the paper so we can find out the effect sizes involved, we simply aren't able to objectively assess the study's statistical power.

But even then, I'm perhaps more worried about the file drawer effect. The type 1 error rate is fixed at 5%, n=36 studies are cheap, and p>.05 studies never get published. And we're looking at exactly one paper here. As far as I'm concerned, you can't have credibility without replicability.

It’s not just about raw numbers though, you need to consider the experimental design, which looks really tight in this case.

It’s a repeated measures study and from the looks of it all subjects spent time in each of the four treatments + control, so it’s direct comparisons of the same people in each condition. They used three separate measures. Accounting for all that, they are working with something like 540 data points, and the fact it’s the same people in each set is a nice little feature for direct comparisons rather than a limitation. They even double blinded everything. All of that has to count for something.

It absolutely does, but also leaves me even less inclined to make much of the abstract alone. You clearly have access to the fulltext. I don't, so the abstract and hearsay from others is all I've had to go on. But my concern about a repeated measures study is that that extra power doesn't come for free. It comes along with a bunch of new and subtle ways for endogeneity to sneak into your model, and, sadly, the menu of techniques for dealing with that introduce a lot of new ways to (accidentally or otherwise) engage in p-hacking.

Since a lot of the things that matter happen behind closed doors, and aren't necessarily mentioned in the paper (elsewhere someone quoted Gilman, another of his good zingers is something to the effect of, "You don't talk about your exes during a date."), there's also just too much room for people to fire spitballs from the back row when you've got a complicated design like that.

On the other hand, a successful, independent replication can be quite compelling. Not only that, but it's on philosophically firmer ground. There's a reason why it was so central to Popper's original formulation of the scientific process. It's the empirical way to vet a result. Squabbling over the statistics, on the other hand, frequently devolves into a kind of sophistry with a different mix of greek letters. It's great fun for economists, but this isn't economics, it's science.