Hacker News new | ask | show | jobs
by devaboone 2109 days ago
There is a lot of excitement about this study, but the details in it matter. It was randomized - but when you look at the two groups (treatment vs. control), they had some key differences. The control group (who didn't receive Vitamin D) had more men (69% vs 54%), more people with hypertension (57% vs 24%) and diabetes (19% vs 6%). So the groups were not the same. If you just compare the numbers of those who went into the ICU vs. not, then the results look amazing. But once you correct for hypertension and diabetes, it was not so impressive - it was not statistically significant in the study. (It is in the study but most people breeze right over that.)
2 comments

I disagree with your interpretation of the statistics. There are going to be inter-group differences with any controlled study. This is sampling error and it's exactly what the p-value and confidence intervals are meant to account for.

It's unlikely that the strongly significant result we're seeing in the study is due to sampling error.

I don't think it is all due to sampling error. In my article I mention that one of the areas in which Vitamin D shows promise is with respiratory illness. I think there likely is an effect, just not as dramatic as the headlines are making it.
Interesting! Can you quote the part of the study where they say the results are not statistically signifcant when you control for diabetes and hypertension?
The study says the opposite:

> Randomization generated groups with comparable percentage of unfavorable risk factors as there was no significant difference in subjects with at least one risk factor, except for high blood pressure and diabetes mellitus, known risk factors for unfavorable disease progression, which were more frequent in patients not treated with calcifediol.

> However, even considering these factors, calcifediol significantly decreased the need for ICU admission in COVID-19 patients in a way not previously reported in this process until now.

Here is what the article states: "Therefore, a multivariate logistic regression analysis was performed to adjust the model by possible confounding variables such as hypertension and type 2 diabetes mellitus for the probability of the admission to the Intensive Care Unit in patients with Calcifediol treatment vs Without Calcifediol treatment (odds ratio: 0.03 (95%CI: 0.003-0.25) (Table 3). The dependent variable considered was the need to be treated or not in ICU (dichotomous variable).) CI:-0.30 - 0.03 p:0.08." The statement is worded in a confusing way, but that is a non-significant p value. Of course we should not put too much into "statistical significance" but it is interesting to note.
The only reason that p value is a bit high in that second multivariate analysis is because of the uncertainty of how much all the different risk factors like hypertension, T2DM, age >= 60 etc. affect ICU admission numbers.

But even with those variables controlled, the 95% confidence interval is 0.003-0.25, which at worst is a 4-fold reduction in ICU risk.

We should also note that the Calcifediol treatment group had 14 patients ≥ 60 years old, and the non-Calcifediol group had 5. So the study looks even better with that in mind...

Unless I'm reading your comment wrong, this p-value (0.03) is actually significant for a 95% confidence test.
0.03 is the odds ratio (a measure of effect size, an odds ratio of 1 would mean the treatment and control arms had the same rate of ICU admittance). The p value is 0.08 from here "CI:-0.30 - 0.03 p:0.08." which captures the likelihood the treatment had an effect.

That said, just looking at p values and applying a cutoff at 0.05 is pretty bad practice that is getting a lot of heat thanks to the replication crisis (does it make sense to behave as though p=0.08 is not true and something at p=0.049 is true? almost certainly not). If you get a value in this range and a huge effect size then it's a really good idea to repeat the experiment with way more data. It's also a common stats error to act as though p>0.05 is the same as knowing something DOES NOT work, all you can say is this specific study wasn't able to show that it does work with 95% confidence.

That's part of the confidence interval. The adjusted p-value is 0.08. However, the cut-off of 0.05 is just a convention. https://en.wikipedia.org/wiki/Misuse_of_p-values. I'd think of this as a grey scale, where 0.08 is roughly in the significance range.

The null hypothesis is that it's highly unlikely VitD has an effect, and we should expect to see that substantiated often in tests. How often? 95% of the time. 5% of the time we can expect to see spurious results from our simplistic model of random processes. Upshot, it's a small change to move those numbers to 92% vs. 8%. In this context, it's fine to say "this was a small pilot that directionally shows we should do a much bigger test", which is what they're now doing.

I’d also like specifics about the “not statistically significant” part - because otherwise there can be a suspicion of cherry-picking the available data. Which I felt, reading parts 2 and 3 of these blog posts, was definitely a possibility.
I tried really really hard to put in all of the most significant studies on Vitamin D. Small, low-quality studies didn't make it in, but the major trials did.
To start, I appreciate your effort on this post. It looks like it took a lot of work and it does a great job of explaining difficult topics like why controls are important.

That said, I don't think it's fair to characterize the COVID study as low-quality. It's worth mentioning contrary results, even if it's an exception that proves the rule.