Hacker News new | ask | show | jobs
by keithwhor 4131 days ago
Hmm, while I understand the problem of gapping is traditionally the hard part, I'm under the impression that the argument you're putting forward is primarily one of semantics.

"In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences." (From the article you linked.)

Gapped sequence alignment is certainly far more robust (and biologically relevant - insertions are a common error - when comparing sequences across organisms) than ungapped, and a much harder problem, but as for the definition of "alignment" itself, I don't believe I've misnamed anything here.

If we're going to be overly pedantic about the use of the word "alignment" that's fine, but I'm not sure it's a worthwhile debate to have. A quick search for "ungapped sequenced alignment" returns a great deal of results on Google [1]. So if I am mistaken, I'm certainly not the first (nor do I believe I'll be the last.)

Furthermore, there's nothing preventing anybody from using the methods described here from implementing an ungapped sequence alignment tool that outperforms tools that only use string comparisons. :)

[1] https://www.google.com/search?q=ungapped%20sequence%20alignm....

1 comments

you shouldn't dismiss this as an argument over semantics if your understanding of the term differs from researchers' use of the term. if you introduce "a fast tool for XYZ" and researchers understand XYZ to mean A, where you understand it to mean B, then the tool is not useful for researchers to perform what they know as XYZ.

tools like BLAST are extremely sophisticated and have been under development for decades, and I'm fairly confident they've moved past naive string comparisons by now.

Fair. Though I'm not convinced "ungapped sequence alignment" is particularly confusing to a researcher, considering there are tools and papers that have existed for decades using this description [1][2][3]. Though the algorithm described in my article is extremely focused on raw performance (and relatively naive with scoring), I would still choose to categorize it as primarily a tool that deals with ungapped sequence alignment, specifically supporting IUPAC degenerate nucleotide sequences. Thus, I believe the initial argument is, indeed, overly pedantic.

And to be clear, nowhere am I comparing what I've developed to BLAST. (They have very different applications.)

[1] http://schneider.ncifcrf.gov/paper/malign/

[2] http://www.ncbi.nlm.nih.gov/pubmed/9697204

[3] http://www.ncbi.nlm.nih.gov/pubmed/15130540