| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by krit_dms 3186 days ago
	so you prototyped in pandas, and build production code around numpy arrays?

3 comments

drej 3186 days ago

Production wouldn't usually be in Python but if it was, it'd probably be numpy (if it was numerical). It's also fairly heavy (we'd usually exclude MKL for that reason), but less 'smart' (fewer defaults, more explicit in most places), so it's a lot safer.

link

jawilson2 3186 days ago

That's what we have done (algo trading). Our research backend uses pandas, but we ended up taking about a month removing it from prod code. It does surprising things with memory usage, and the functionality we needed was more or less wrappers around numpy anyway. Most of our performance critical code is in cython as well. For this trading application, speed obviously isn't the biggest concern, so python+numpy is fine. It is C++/Java everywhere else though.

link

sandGorgon 3186 days ago

any opinions between cython vs numba ? especially now that numba has gpu acceleration

link

jawilson2 3186 days ago

Never tried numba. I write all of our cuda stuff by hand anyway, and wrap that into cython from c++ where needed.

link

nas 3186 days ago

We went through sort of a similar exercise. The features that pandas provided was compelling. For example, our main research guy uses R and so something like data frames were wanted. My conclusion was that pandas was too heavy to add as a dependency however. sloccount says about 200k lines of code.

Instead, I wrote a small wrapper around numpy to provide a data frame like object (850 lines of code by sloccount). So far, this has worked well for us.

link