Hacker News new | ask | show | jobs
by echlebek 3856 days ago
Once you've exhausted all the low-hanging fruit, like people calling .keys() on dicts, or doing unnecessary linear searches, Cython really starts to shine. I've seen it perform ~40 times better than pure Python in time-consuming loops.

We do scientific computing at my company. Numpy does 90% of the work, but there are some algorithms that just aren't easily expressed with arrays. That's where Cython comes in.

2 comments

> Numpy does 90% of the work

Numpy and scipy have been the core of a huge amount of my optimisations. The first question I try and ask is

"Could this be solved with matrix multiplications and summing?"

Often the answer is "yes" and allows you to group a huge amount of calculations all together, and use the heavily optimised code available numpy/scipy.

I recently swapped out something that was running at about 100 rows calculated/second to about half a million in about 0.2s.

In fairness, I should point out that the really slow version was also written by me :)
Fantastic!
Can you explain the context in which .keys() is called often and is not appropriate and the alternative?
.keys() returns a list (in python2) so if you write

  for k in dict.keys():
    ...
then python first builds a list of all the keys, loops through them and then throws away the list. If the dict is large, this can be quite expensive. The correct way is to either use .iterkeys() which returns an iterator which generates the keys one at a time, or simply iterate directly over the dict, saving you need to first copy all the keys into a list you'll just throw away.

This has been 'fixed' in python3 and .keys() now returns an iterable view of the keys, and if you actually want a list of the keys you have to explicit and write list(dict.keys())

The easiest is to never use .keys() or .iterkeys(), and always iterate over the bare dict:

    for k in dict:
        ...

    if k in dict:
        ...
If you do need a list of keys, list(dict) has the advantage of working in both Python 2 and 3.
Just to clarify your python 2 note: In python 3 keys() returns an iterator, so there is no penalty (i.e. iterkeys was dropped, keys assumed iterkeys interface).

The equivalent python2 behaviour can be obtained using list(somedict.keys())