|
|
|
|
|
by xaa
4131 days ago
|
|
My colleague (an editor at Bioinformatics journal) and I were joking the other day about how every other paper title, especially about aligners, seems to include the word "fast" in it. This takes it to a new level. As a learning exercise, this is interesting and fine. I am trying very hard to suppress the inner "reviewer" right now. Walking away...not comparing this to existing algorithms which are implemented in highly optimized C/C++, or CUDA, or even hardware. About why other authors would go to such extraordinary lengths if high-level languages are suitable. Not going to ask how conclusions can be drawn about the suitability of Javascript for very computationally-intensive tasks without solving the actual alignment problem rather than a subset, let alone comparing to existing tools. Not going to ask about the application of the tool. It's not a paper. I'm breathing. OK. |
|
The initial point of the tool / algorithm was to find all potential binding sites for a DNA-binding domain of a protein in 100kbp - 1Mbp genome. (Even those with sequence identity ~50% or less.) This is provided you have a consensus sequence that contains ambiguous nucleotides. (For example, roughly discerned from a sequence logo.) It quickly turned into development of a general bioinformatics library in JavaScript, and a chance to see how far and fast I could push V8 at doing these sequence comparisons.
I would love (at some point) to go into significantly more detail and compare what I've written here with existing tools. If you're willing to offer mentorship or guidance (or know somebody who would be), it would be fantastic to present the information in a more thoroughly peer-reviewed context. Otherwise, the post (and associated library) are meant primarily as learning tools for both biologists and developers.