Hacker News new | ask | show | jobs
by syllogism 4258 days ago
Here's how I would write this, in Cython using "pure C" arrays:

https://gist.github.com/syllog1sm/3dd24cc8b0ad925325e1

It's getting 18,000 steps/second, in the same ballpark as your C code.

I prefer to "write C in Cython", because I find it easier to read than the numpy code. This may be my bias, though --- I've been writing almost nothing but Cython for about two years now.

Btw, if anyone's interested, "cymem" is a small library I have on pip. It's used to tie memory to a Python object's lifetime. All it does is remember what addresses it gave out, and when your Pool is garbage collected, it frees the memory.

Edit: GH fork, with code to compile and run the Cython version: https://github.com/syllog1sm/python-numpy-c-extension-exampl... . I hacked his script quickly.

2 comments

If I could have your permission, I'd like to incorporate this into a future post in the series. I can credit you in any way that you'd like.
That's fine, please link to http://honnibal.wordpress.com . I'll probably write a short note on it, I've been meaning to say more about "my way" of using Cython.

Edit: Submitted here. https://news.ycombinator.com/item?id=8483872

I think if you would use memory views you could have the benefits of both fast low level access, plus the vectorized numpy functions on the other hand (I'm thinking here of initializing the arrays with a single call to numpy.random.uniform). With multi-dimensional arrays it's definitely better than plain pointers.
Well, as you know, if I'm using a multi-dimensional array, it's usually super sparse! (Because NLP). So I want to define those myself, not use the numpy ones.

Maybe I just never learned numpy. But I had to go and look up what that stuff did, and it wasn't obvious to me what the data types of those arrays would be. So, I like the C-style initialization actually --- just because it's more obvious to me.