Hacker News new | ask | show | jobs
by nonbel 3519 days ago
Unless I am misunderstanding something, their figure 3 seems to be plotting effect size vs p-value... So all it would be showing is that they had more data from lung adenocarcinomas (ie sample size is larger for that cancer type). It isn't 100% clear to me if they shared the data used for that figure, but here are the frequencies each cancer type appeared in table S1:

  Acute myeloid leukaemia (AML)                       Bladder 
                            202                           399 
                         Cervix             Colorectal cancer 
                            168                           559 
      Esophageal Adenocarcinoma           Esophageal Squamous 
                            242                           292 
                 Gastric cancer                        Kidney 
                            472                           257 
                         Larynx                         Liver 
                            123                           392 
                     Lung Adeno                 Lung Squamous 
                            678                           175 
                    Oral cavity                Ovarian cancer 
                            363                           458 
                       Pancreas                       Pharynx 
                            239                            76 
         Small Cell Lung Cancer 
                            148 
It is essentially just figure 1 from here: https://arxiv.org/abs/1311.0081
1 comments

Looking more I see:

"Comparison of overall methylation between smokers and non-smokers was performed for all tobacco-associated cancer types for which there were available data from Illumina Infinium HumanMethylation450 BeadChip array, where each array contains 473,864 autosomal CpG probes. The examined data were downloaded from the original data source (Table S1)

[...]

distributions were subsequently compared between smokers and non-smokers using a two-sample Student’s t-test. Results were considered significant for Bonferroni threshold of 10-7."

So it is not like figure one from that Lew paper, because their effect size is not normalized to the inter-individual variance. This is a point in their favor.

However, the sample sizes do match up to those found in table S1 (which I posted above). From the data provided, we cannot tell whether that difference in p-values is solely due to sample size or not. They need to tell us the variance for each CpG/tissue combo as well.