Hacker News new | ask | show | jobs
by vineetg 3062 days ago
Original author here.

You're right - conceptually the CRISPR search problem and DNA sequence alignment are related. In both, you're looking for place where two (or more) sequences are very similar. I would say there are two major differences.

The first is in the goal of the search. Typically, alignment tools try to find the best positional alignment for two (or more) sequences. The CRISPR search problem is to find every possible match above some similarity threshold.

There are also a few constraints on the CRISPR search problem that allow us to make this much faster than a general DNA sequence alignment tool:

1) We know that that guide sizes tend to be very small (~20bp) 2) Part of the guide must match exactly (the PAM site), allowing us to restrict our search even further. 3) We don't need to worry about insertions or deletions in our search.

Using those three constraints, we can do this search a lot faster than a more general DNA alignment tool!

1 comments

Honestly you are taking a big risk designing a DNA search algorithm something from scratch. It's akin to the risk people take when they roll their own crypto. There are aspects of this that you may not be considering, and it tends to be best to rest on the extensive work in the field than assume it is a trivial problem.

How do you deal with natural variation in the genome? Can you be sure your gRNA doesn't target an essential locus in some percentage people who carry a particular allele? The data to solve this is out there (1000 Genomes for instance).

Edit: excuse me, I appreciate that you are using a collection of whole genomes as your target. Will this reliably scale to thousands or millions of genomes and likely recombinations between them?