Hacker News new | ask | show | jobs
by tgb 2546 days ago
A reason to take this with a grain of salt: transcript length is the biggest technical effect in RNA sequencing. Longer transcripts get broken into more fragments and get sequenced more deeply. What this means is that if you perform any experiment you tend to get a "length effect" of some sort. The second feature they mention in the GTEx data is GC-content, which is probably the second biggest technical bias in RNA sequencing and again basically any experiment has a "GC-content effect" of some sort. But I don't interpret those as meaning that there is something directly acting on long transcripts or high-GC transcripts, rather that whatever is happening biologically ends up appearing as a length or GC effect after sequencing. It's a little fishy that the only features they find are features that I would expect to always find.

The most compelling reason to think that's not simply the case here is that seem to be noticing a consistent downward trend across all long transcripts with age which is more compelling than merely noting that long transcripts change (some up and some down).

2 comments

It's a good point. Why would the length effect you describe be be associated with age, across many organs, cell types, datasets, and species? The technical effect would be a good explanation for this finding in one dataset, but it seems unlikely that many datasets would have a technical length effect that correlates with age by chance.
What I'm saying is that it looks like there's an effect and that effect is visible as a change in expression vs length but that I wouldn't expect it to be too related to length in a meaningful way biologically. If you take one population of transcripts and another and you measure the lengths, it's likely that you'll see a shift in the median - regardless of whether length is important, particularly due to the specific ways in which length relates to sequencing depth. And on top of that, comparing across genes requires compensating in some way for the length of the gene and it's not obvious how to do that correctly - could they be finding an artifact of how they normalized for length? (Eg: a "gene" actually doesn't have a single length, it's multiple possible variations in transcripts of different lengths and most reads from the sequencer is ambiguous as to which it came from. Quantify the different transcripts incorrectly - and it's impossible to do it correctly - and you may be mis-estimating the effective length and mis-normalizing.) It's a starting point of an investigation, not an end point.

(And they do try to take the next step to make that investigation and they report that they see a further decrease in a gene related to transcribing long transcripts. However it's 27th in their list of related genes and I'm not sure how unlikely having one of the top N genes has a reported connection to transcription. Hopefully they will follow up with a biological experiment involving knock-down of this gene and seeing an accelerated aging phenotype or something of that sort.)

The most compelling piece of evidence in my mind here is that the effects they report are consistent in direction across conditions. The most worrisome is that they tested a bunch of factors and the only ones they report as consistently informative are the ones that confound technical aspects the most and therefore are confounded with any number of underlying biological changes.

Thanks, your points make sense. It’s definitely a worrisome coincidence given the multiple tests they ran but didn’t correct for. I hope to see that knockout experiment you describe!
It could still be an artifact of something else, e.g. differential digestion associated with age produces length and GC artifacts.