|
|
|
|
|
by dasht
5604 days ago
|
|
Given a (not even very large) cluster, we (with the fairly brute force-ish but cleverly tuned NFA) can populate hash tables for the read-based NFA of your 2 billion needles just about as fast as you can transmit the data and give you optimal alignment (by flexible criteria) just about as fast as you can then stream through the 3 billion base pairs. We are down to, at most, single or double digit dollar differences either way, but mine is vastly simpler and easier to tweak. Probably, we can do my way cheaper as the commodity computing market improves. If you want to tell me that BWT is more useful for something like searching an FBI or CIA DNA database, and that intel types want to encourage commercial development of BWT (even if irrational for the customers) to subsidize its covert use in intel -- that I can believe. I can see where it would be helpful not for resequencing (a way to read off interesting details of an individuals genotype based on a lot of reads) but rather for fingerprinting of suspects and surveillance targets against a large database of reference genomes. |
|
As for the FBI's DNA database, this does not consist of sequence data, but, I believe, just microsatellite data. Even if they had full sequence data, it wouldn't make sense to search for new samples against the whole database, but to align the sample with a reference genome and then match the variations across a database of variations.