Hacker News new | ask | show | jobs
by marsdentech 1956 days ago
As others have said, you're reading a sliding window of k-mers over the target sequence; I think for the MinION k is presently 5. To answer your question directly, it struggles with homopolymer runs, not inherently because they're low complexity, but actually because it's tricky to "clock" how many like, contiguous k-mers have passed through the pore after a given period of time. That is to say, for example, if your target sequence is "GGGGGGG" (i.e. a homopolymer run of 7 Gs), you'd expect to observe three like, contiguous signals (i.e. in current space) for the all-G 5-mer, one signal each per "clock cycle" (which corresponds to the dwell time of the k-mer in the pore). If these "clock cycles" were always constant, it's merely a case of dividing the "time spent on the observed all-G 5-mer" signal by the the "time spent on one clock cycle". Sadly, for our purposes, there's enough wobble in any one such "clock cycle" that that calculation won't always yield a reliable result. The upshot: your "GGGGGGG" (7 Gs) target sequence may be registered as "GGGGGG" (6 Gs) or "GGGGGGGG" (8 Gs), or even something else. Now, for distinguishing two alleles where the difference between them is, say, a doubling in length of an already-very-long homopolymer run, even with the aforementioned "clock wobble", you'd likely be able to see that in MinION data quite clearly. As with all thing DNA sequencing (for the time being, at least!), your precise biological question will determine which (one or more) sequencing techniques are best for the job!
1 comments

Just a thought. If the DNA were run through 2 such holes, you could use a nearby non-uniform sequence to clock the reading of the other one. Not a magic bullet, but maybe an improvement. Assumes the readers can be close enough to bound the amount of slack between them, and that they dont interfere with each other.
Good thinking! The newer R10 pores have a dual read head, essentially two holes in sequence inside the same barrel. The linked page [1] has an image.

[1]https://nanoporetech.com/about-us/news/r103-newest-nanopore-...

A single flow cell contains a few thousand pores (I think this is what you mean by "holes") that are all at different stages of passing different molecules, with signal data being captured from a few hundred at any given time. In practice you'd never expect (nor could you arrange) for two pores to be at the same stage of processing the same (or any pre-determined) molecule at the same time, so correlation information like that is out. The "clock rate" is determined by the so-called motor protein that "pulls" the nucleic acid molecule through the pore, if you fancy going down the reading rabbit-hole...
No, I meant a single pore with two readers. So the same molecule is being read at 2 positions. Movement might be detectable in one, but not the other because it's full of repeats.