Hacker News new | ask | show | jobs
by harveywi 3434 days ago
Thanks for volunteering to answer some questions.

1. What is the (maximum) range of read lengths that modern gene sequencers can produce? Any timeline on when those read lengths will increase substantially?

2. How do bioinformatics people contend with repetitive genomic regions?

3. Are there any differences in how gene sequencing technology works on DNA from different species? For example, does an approach that works on humans (e.g. gene sequence alignment or de novo assembly) work on something like wheat?

1 comments

1. Depends on the technology. On Illumina (cheapest tech and highest throughput), you get the first and last 125 bases of smallish DNA molecules with an acceptable error rate. Pacific Biosciences (lower throughput and more expensive) gives you up to 40.000 bases with a rather horrible error rate.

2. They fail epically. There is nothing you can do computationally. With paired end reads (two reads at an approximately known distance), you still can't assemble repetitive regions, but you can get the contigs around the repeat in the right order.

3. Definitely, but I don't know the details. Plants are often more difficult than animals; they have bigger genomes and often have multiple chromosome sets. Assembly of a wheat genome is more difficult than assembly of the human genome---and I'd argue even the latter isn't actually a solved problem.