|
|
|
|
|
by mbreese
4131 days ago
|
|
If you're looking for a DNA binding sequence, you might be using the wrong algorithm here... Have you thought of using a position weight matrix (PWM)[1]? That's the way that I've always searched for binding sites, since that's how motifs are usually described. You can use these to still match ambiguous sequences, but favor some bases more than others. [1] http://en.wikipedia.org/wiki/Position_weight_matrix |
|
Think about a length 20 sequence, if only 5 in 100 identified high-affinity 20-mers contained a "C" at position 0, and the others all contained a "G", do I really care about the weight matching, or can I approximate that position as a "G" and still get roughly the same results?
(Though it would be interesting to apply a PWM to the top [x] results of this algorithm, once completed, to specify exact rank.)