|
|
|
|
|
by aschreyer
4788 days ago
|
|
The Ultrafast Shape Recognition (USR) algorithm is a very simple yet interesting application used in drug discovery that I tried speeding up with Numba (the similarity calculation part). The NumPy implementation looks roughly like this: def usr(X, y, S=0.9, N=10):
scores = 1.0 / (1.0 + 1/12.0 * np.abs(X - y).sum(axis=1))
scores = scores[scores>=S]
scores.sort()
return scores[-N:][::-1]
Where X.shape could be (2000000, 12) (or more rows) and y.shape (12,). The idea is to retrieve the top N most similar hits above a similarity score of S. |
|
The numba code isn't as pretty as it could be because slicing doesn't work for overlapping memory regions or wraparound indexing yet, and we don't have inlining :(
Here's what I get on a 2.6 GhZ Intel Core i7.
I rewrote your code to minimize memory traffic, then jitted it with numba:
For the case aschreyer is interested in, I see a 24x speedup from half a second to two hundredths of a second. For a really big problem (2 x 10^7), numba is still well under a second and the numpy code is starting to really suffer.My full code is here: https://gist.github.com/ahmadia/5550933
I'm putting it into a wakari notebook so you can actually check me on this :)
Edit 1 - Made the speedup a little more comprehensible (and fixed gist)