Hacker News new | ask | show | jobs
by valine 3438 days ago
This is really interesting. For someone who knows nothing about the subject, how were DNA strands physically read at a low level before computational methods? I was under the impression DNA is too small to see without an electron microscope. You mention reading dna by hand, and I'm really interested in how that is done.
3 comments

Given a string, I can easily discern one characteristic, which is length. That's because the length of the string is tied to how "massive" it is and thus when I push on things that are more massive they move more slowly. That's the general idea behind gel separation.

Now, I just need a way to make all the combinations of substrings starting from the first position (0 => 1, 0 => 2, etc.). This is a bit more difficult to explain and chemically intensive, but let's assume for each character (C) we have another character (C') that is pretty much the same thing. The key difference, however, is that C' is marked (Radioactivity or with something that lights up) AND that it doesn't allow any more characters to be added on. If each distinct C' is a different color, we can now distinguish between our different substrings, based entirely on the last character. We know that our strings are ordered by size, so we can construct our original sequence based on the terminal member of the substrings.

You can imagine this process being done by hand, it works for that. However, it doesn't scale well to the millions and billions of base pairs we need in the modern day.

As a fun aside, protein sequences were originally determined in a way pretty much the inverse of this. For a given protein string, remove the first element with chemistry. Then, try to figure out what you removed. Now take your string of size N - 1 and repeat, until you have determined each character. This method ended up not being tractable for DNA because of chemical differences. Also, a lot of protein sequencing is done in a similar way to DNA sequencing, in that we break up, shatter may be a better word, the protein. We then try to construct the original protein based on how is shatters (Like reconstructing a window based on knowing where the pieces fall and where the baseball came from).

That depends on the technology. The technology I most often work with will have lots of fake DNA basepairs in a soup, which has the real subject's DNA broken apart into fragments and attached to a substrate to keep it from moving. The fake DNA basepairs bind to the complementary real basepairs and emit fluorescent light when doing so. En masse, the fluorescent light gets captured on camera and each of the possible basepairs' colors are scored, to provide a call for that basepair. Repeat the cycle a few times, map the lights to the same spots, and you generate a sequence of basepair 'reads' which are then sent further down a software pipeline for later analysis.

...the later steps in the pipeline involve using lots of complex math to reassemble the sequenced fragments back together, either using a reference assembly (such as the Human Genome Project) or else de-novo assembly (which basically _builds_ a reference assembly through lots and lots and lots of effort).

There are other technologies as well which I'm not so familiar with.

Look up Sanger sequencing.