|
|
|
|
|
by keithwhor
4131 days ago
|
|
This was certainly not intended as an academic submission, merely a description of a method used. You can breathe. :) The initial point of the tool / algorithm was to find all potential binding sites for a DNA-binding domain of a protein in 100kbp - 1Mbp genome. (Even those with sequence identity ~50% or less.) This is provided you have a consensus sequence that contains ambiguous nucleotides. (For example, roughly discerned from a sequence logo.) It quickly turned into development of a general bioinformatics library in JavaScript, and a chance to see how far and fast I could push V8 at doing these sequence comparisons. I would love (at some point) to go into significantly more detail and compare what I've written here with existing tools. If you're willing to offer mentorship or guidance (or know somebody who would be), it would be fantastic to present the information in a more thoroughly peer-reviewed context. Otherwise, the post (and associated library) are meant primarily as learning tools for both biologists and developers. |
|
Your initial problem, if you frame it as the desire to simply enumerate all the degenerate sequences and loci, could be solved any number of ways as other commenters have mentioned. Probably I would reach first for a regex. But sure, no crime in learning to implement a new algorithm while also testing the limits of V8. Probably half of my grad school time was spent that way ;)
I think your best bet if you wanted to publish would be to find a use case for in-browser alignment. It would be hard to answer the obvious question, "why not server-side?" though. But who knows, people are somewhat often taking old bioinformatics algorithms and saying "but now you can do it on your phone!!". And they do publish.
But as for matching both speed and accuracy of state-of-the-art aligners with Javascript, it is my considered, scholarly opinion that you have no chance in hell. So you shouldn't present it that way. It would be unnecessary to compare (at least for speed) if you weren't making claims about speed.