|
|
|
|
|
by epistasis
5605 days ago
|
|
You'd be prerty confident in the other direction once you understand the problem better. The BWT is not compression as much as an extremely clever way of rearranging the haystack. There are many many different string alignment (not string search) algorithms that are useful with DNA, and where the BWT is used your algorithm is not going to be in the realm of useful. BWT based aligners run in time completely independent of the haystack length. When you have 2 billion needles of size 50-200, and the haystack is 3 billion long, it makes a ton of sense to pay the preprocessing cost of O(n lg n), since it only has to be done on the order of once a year. |
|
If you want to tell me that BWT is more useful for something like searching an FBI or CIA DNA database, and that intel types want to encourage commercial development of BWT (even if irrational for the customers) to subsidize its covert use in intel -- that I can believe. I can see where it would be helpful not for resequencing (a way to read off interesting details of an individuals genotype based on a lot of reads) but rather for fingerprinting of suspects and surveillance targets against a large database of reference genomes.