| I can address some of these. Sequencing today is done mostly using computational methods. Think of DNA as a couple long strings (Number of bases is effectively the character count of those strings, each string is a "chromosome" in higher organisms), so the problem is how do we read these long, physical strings. It turns out that parallel processing is way more effective, so we break the really long strings into much, much smaller strings that overlap (Millions of characters long to hundreds often). Because the strings overlap, we can construct a good portion of the actual sequence computationaly by exploiting this overlapping feature of our small strings. The physical way they do this is by using machines (Think GPU vs CPU) that are effectively a bunch of parallel microscopes specialized to read those short strings and by "attaching" colors to each of the characters (DNA bases). Initial DNA sequencing methods lacked both the computational and physical devices to do this, so they were done by hand. The move from doing sequencing by hand to doing it computationally is why we see the significant increase in characters read (Number of bases). Your last comment I think is the most interesting, as it effectively asks "Why do mice have a larger string size than us, which means they contain more information on an absolute level?". The answer is just because. The number of bases, or even the number of blocks of information that produce proteins (These blocks are called genes, and a protein is another chemical construct that mainly focuses on doing actions in the cell), is not strongly correlated with the complexity of the organism. The key is how those bases interact, not necessarily in how many there are. If you have any more questions or need some clarification I'd love to address them, it is a wonderful time to be alive. |
1. What is the (maximum) range of read lengths that modern gene sequencers can produce? Any timeline on when those read lengths will increase substantially?
2. How do bioinformatics people contend with repetitive genomic regions?
3. Are there any differences in how gene sequencing technology works on DNA from different species? For example, does an approach that works on humans (e.g. gene sequence alignment or de novo assembly) work on something like wheat?