Hacker News new | ask | show | jobs
by derefr 2162 days ago
> Repetitive sequences yield lots of short reads that look almost identical, like a large expanse of blue sky in a puzzle, with no clues to how the pieces fit together or how many repeats there are

Ah, https://en.wikipedia.org/wiki/Clock_recovery !

Too bad DNA isn't a run-length limited code. (Wouldn't that be something.)

1 comments

DNA is a code, but it's not just a code. It is also a really long molecule that has to bend and fold up in certain ways.

There are error detecting codes, in a way. Protein is encoded by 3 base codes, and if you insert or delete bases not in a multiple of 3 it will be misaligned, then eventually likely encode a stop code and cause the bad protein to be truncated and likely removed via nonsense mediated decay.