Hacker News new | ask | show | jobs
by dasht 5609 days ago
I get that its the standard but see my comment above. If you can come close to I/O saturating inputing read data at only the cost of streaming the reference near I/O saturation, that's optimal -- and I don't think you need the hair of BWT to do that (and the BWT memory cost might even hurt you).

Its different for an imagined database where you are trying to align reads against a lot of personal genomes, e.g., to identify someone given a pretty complete genomic database of the usual suspects plus a few reads from the crime scene -- more around-the-corner "GATTACA" scenario than today's FBI, perhaps. There, perhaps, its economic to dedicate one server to ever N "usual suspects", loading the server with BWT of the suspect database, then stream reads out to the servers rather than loading servers with set of reads and streaming a single-human reference over that. (Of course, maybe you could just align the crime scene reads against a single human reference and then .... So even in GATTACA land use of BWT seems like dubious hair against a lightweight opponent like MAQ)