Hacker News new | ask | show | jobs
by wirrbel 1242 days ago
short answer: its complex and there are books on the topic.

lesser-disappointing answer:

You have a hypothesis how STUFF works differently when you make an intervention (experiment, i.e. collect data, change something or go to the control group, collect more data).

Your default assumption is that your experiment won't show a meaningful difference, OR it could show a difference (positive/negative). Now what you observe may not be the reality. Which leaves you with 4 possible situations:

False-positive, true-positive, false-negative, true-negative

Most statistical methods used in data analysis take great care to minimize the probability for a false positive (probability our methods yields 'positive', when in fact there is no effect in reality. This probability is the famous 'p Value' (sometimes p Value also refers to a threshold of this probability).

So when you do certain statistical tests, you receive a p-Value, apply a threshold consideration p<5% for example, this means that you assume that only every 20th experiment where in reality there is no effect results in a 'significant' finding (i.e. a false-positive).

So naively increasing your sample size will not lower your false-positive probability if-and-only-if your analysis method corrects for it. However the sample size strongly influences the false-negative rate, i.e. a Student t-Test with p<0.05 will with sample size N=3 yield a false-positive with still a 5% probability, which in practice then means, that there is a slim chance to get a true-positive results.

The criticism here about sample size does from this perspective not make too much sense, however: we need to keep in mind:

A) There is a whole field of problems about controlling variables (i.e. adding more columns to your data table). Each variable adds another dimension to your problem, and this quickly leads to a 'curse of dimensionality' problem. Is the observed effect explained by your experimental intervention, or is it in differences between your control group and your study objects (sex/gender/socioeconomic status/age/training level/ overall health). Quickly not being able to control for a variable can lead to false-positive results.

B) complexity of the method at play. The study uses ANOVA (analysis of variance). Its been years that I last looked at it so I am not making statements here.

C) Crucially: Many methods actually assume Normally-distributed data (Gaussian distribution). However, if you collect data it is rarely normally-distributed, one can use methods for normally-distributed data on non-normally-distributed data because of the "law of large numbers", i.e. mixtures of non-normally-distributed datasets typically tend to end up being normally-distributed. but this does not happen at N=10.

There are a few finer points to mention here, which is that many HN commenters have a machine-learning background and may be a bit biased against smaller-sample-size studies for multiple reasons that are specific to what they are used to in the machine-learning world. And on the other hand, from my experience majoring in biophysics, many health-related studies on sports and obesity really have low-quality stats and overestimate the predictive power of their datasets.

tl;dr: I would only conclude from this study that HIIT is better than nothing, not that it is better or worse than other cardio exercise.

PS: The above text tries to break down complex stuff and thereby by definition contains mistakes.