Hacker News new | ask | show | jobs
by krit_dms 3186 days ago
so you prototyped in pandas, and build production code around numpy arrays?
3 comments

Production wouldn't usually be in Python but if it was, it'd probably be numpy (if it was numerical). It's also fairly heavy (we'd usually exclude MKL for that reason), but less 'smart' (fewer defaults, more explicit in most places), so it's a lot safer.
That's what we have done (algo trading). Our research backend uses pandas, but we ended up taking about a month removing it from prod code. It does surprising things with memory usage, and the functionality we needed was more or less wrappers around numpy anyway. Most of our performance critical code is in cython as well. For this trading application, speed obviously isn't the biggest concern, so python+numpy is fine. It is C++/Java everywhere else though.
any opinions between cython vs numba ? especially now that numba has gpu acceleration
Never tried numba. I write all of our cuda stuff by hand anyway, and wrap that into cython from c++ where needed.
We went through sort of a similar exercise. The features that pandas provided was compelling. For example, our main research guy uses R and so something like data frames were wanted. My conclusion was that pandas was too heavy to add as a dependency however. sloccount says about 200k lines of code.

Instead, I wrote a small wrapper around numpy to provide a data frame like object (850 lines of code by sloccount). So far, this has worked well for us.